RE: my first raid disaster on reboot :o( update

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: my first raid disaster on reboot :o( update
@ 2005-09-08 11:38 Ken Walker
  2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
  0 siblings, 1 reply; 18+ messages in thread
From: Ken Walker @ 2005-09-08 11:38 UTC (permalink / raw)
  To: linux-raid

I'm getting confused again.

I installed Debian 3.1 onto two SCSI drives set up as raid1.

I also set-up the four ide drives, during installation and set them as

/dev/md7 using /dev/hda,/dev/hdc
/dev/md8 using /dev/hab,/dev/hdd

both ext3

they started, and sync'd

on reboot, md7 and md8 didn't auto start.

so i created them again with

mdadm -C /dev/md7 -l1 -n2 /dev/hda /dev/hdc

this stared rebuilding.

then i did the same for md8

mdadm -C /dev/md8 -l1 -n2 /dev/hdb /dev/hdd

then i did 

mkfs.ext3 /dev/md7
mkfs.ext3 /dev/md8

I checked with Fdisk that they were all set as FD.

then i did

(i made a copy of the original mdadm.conf first.)

mdadm --detail -- scan > mdadm.conf

And on reboot only md0 would mount.

So i copied the original mdadm.conf back and rebooted, and all the raids
apart from md7 and md8 started.

I noticed at the top of the original mdadm.conf i had the following

DEVICE partitions

so i did

mdadm --detail -- scan > mdadm.conf

again, with md7 and md8 running and rebooted.

adding 

DEVICE partitions

back to the top

The system booted up but again without md7 or md8, it did its corrupt
superblock or ext2 file system complaints.

But I'm getting confused, 

because, on

http://www.linuxdevcenter.com/pub/a/linux/2002/12/05/RAID.html

which is where i got the 

mdadm --detail -- scan > mdadm.conf

from, 

the example he gives

DEVICE	/dev/sdb1 /dev/sdc1
ARRAY 	/dev/md0 level=raid0 num-devices=2
UUID=410a299e:4cdd535e:169d3df4:48b7144a

is the other way round in my mdadm.conf file, i have

ARRAY 	/dev/md0 level=raid0 num-devices=2
UUID=410a299e:4cdd535e:169d3df4:48b7144a
DEVICE	/dev/sdb1 /dev/sdc1

Which way round should it be?

I have also read that a mdadm.conf file isn't really needed, but can be
helpful, if i hide me mdadm.conf file will the system boot with md7 and md8.

I do have those two raids in my fstab file at the end as

/dev/md7	/Cad100	ext3	defaults  0 2
/dev/md8	/Cad200	ext3	defaults  0 2

Can anybody help :o(

Ken

-----Original Message-----
From: Ken Walker [mailto:ken.walker@manchester.ac.uk]
Sent: 06 September 2005 2:26 pm
To: linux-raid@vger.kernel.org
Subject: my first raid disaster on reboot :o(

I've got debian 3.1, kernel 2.6 installed on a machine with two 9.1g SCSI
and 4 160g IDE's.

The SCSI is split up into /  /usr  /var  /swap  /tmp  and /home, each set as
a raid1.

The IDE's are set up as raid1 on the ide channels, such that hda is mirrored
with hdc and hdb is mirrored with hdd.

I had to move the system today so powered down with shutdown -h now.

On reboot i just get / mounted ( i think ) and everything else says mdx
corrupt superblock or such and not a valid ext2 fs.

all the mirrors were set us as ext3 and when it was up and running
/proc/mdstat said all was well.

/etc/fstab has all the raids present.

I'm kinda stuck as to where to start.

Could anybody point me in the right direction please.

many thanks

Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Drive fails & raid6 array is not self rebuild .
  2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker
@ 2005-09-08 18:54 ` Mr. James W. Laferriere
  2005-09-08 19:34   ` Molle Bestefich
  2005-09-08 21:09   ` Neil Brown
  0 siblings, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-08 18:54 UTC (permalink / raw)
  To: linux-raid maillist

 	Hello All ,  Is there a documented procedure to follow during
 	creation or after that will get a raid6 array to self
 	rebuild ?

 	Why I am asking .
 	I was getting the errors below at a heavy rate , so ...

Sep  7 20:11:49 localhost kernel: scsi2 (2:0): rejecting I/O to dead 
device
Sep  7 20:11:49 localhost kernel: md: write_disk_sb failed for device 
sde
Sep  7 20:11:49 localhost kernel: md: excessive errors occurred during 
superblock update, exiting
Sep  7 20:11:49 localhost kernel: raid5: Disk failure on sde, 
disabling device. Operation continuing on 35 devices

 	I ran the below & the above messages stopped .  But the array
 	(appears to have) never tried rebuilding .

# mdadm --manage --fail /dev/md_d0 /dev/sde

 	The problem arose because the drive died totally . ie:
root@devel-0:/ # fdisk /dev/sde

Unable to open /dev/sde

# cat /proc/mdstat
...snip...
md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] 
sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] 
sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] 
sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] 
sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
       1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] 
[UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
...snip...

# cat /etc/mdadm.conf
DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s]
ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4 
UUID=2006d8c6:71918820:247e00b0:460d5bc1

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
@ 2005-09-08 19:34   ` Molle Bestefich
  2005-09-08 21:09   ` Neil Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Molle Bestefich @ 2005-09-08 19:34 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

Mr. James W. Laferriere wrote:
> Is there a documented procedure to follow during
> creation or after that will get a raid6 array to self
> rebuild ?

MD will rebuild your array automatically, given that it has a spare disk to use.

> raid5: Disk failure on sde, disabling device. Operation continuing on 35 devices

Seems like a raid5, not raid6..

> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]

No need to do any rebuilding on the remaining devices, since the data
on them are fine.

You've lost redundancy however, so you should add a new disk to the array ASAP.

With 35 disks, I'd recommend that you at least use raid6 in place of raid5..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
  2005-09-08 19:34   ` Molle Bestefich
@ 2005-09-08 21:09   ` Neil Brown
  2005-09-08 21:39     ` Mr. James W. Laferriere
  1 sibling, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-08 21:09 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

On Thursday September 8, babydr@baby-dragons.com wrote:
>  	Hello All ,  Is there a documented procedure to follow during
>  	creation or after that will get a raid6 array to self
>  	rebuild ?

I suspect a kernel upgrade would do the trick, though you don't say
what kernel you are running.
You could probably kick it along by removing and re-adding your spare:
  mdadm /dev/md_d0 --remove /dev/sdao
  mdadm /dev/md_d0 --add /dev/sdao

(And I assume you mean 'raid5' rather than 'raid6', not that it
matters..)

NeilBrown

> # cat /proc/mdstat
> ...snip...
> md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] 
> sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] 
> sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] 
> sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] 
> sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>        1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] 
> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-08 21:09   ` Neil Brown
@ 2005-09-08 21:39     ` Mr. James W. Laferriere
  2005-09-09  0:50       ` Neil Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-08 21:39 UTC (permalink / raw)
  To: linux-raid maillist

 	Hello Neil ,  Inline .

On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>>  	Hello All ,  Is there a documented procedure to follow during
>>  	creation or after that will get a raid6 array to self
>>  	rebuild ?
> I suspect a kernel upgrade would do the trick, though you don't say
> what kernel you are running.
> You could probably kick it along by removing and re-adding your spare:
>  mdadm /dev/md_d0 --remove /dev/sdao
>  mdadm /dev/md_d0 --add /dev/sdao
>
> (And I assume you mean 'raid5' rather than 'raid6', not that it
> matters..)
 	Sorry ,  yes I meant raid5 .

 	My kernel version is .
root@devel-0:/ # uname -a
Linux devel-0 2.6.12.5 #1 SMP Fri Aug 26 20:09:46 UTC 2005 i686 GNU/Linux

 	When I try to do the remove I get .
root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
mdadm: hot remove failed for /dev/sdao: Device or resource busy

 	I should also have 3 other drives that are spares .  I could
 	try hot remove on one of them .  See at bottom the output of
 	mdadm --misc -Q --detail /dev/md_d0
 	Which is showing no spare drives ?  And I built it with 4
 	spares

root@devel-0:~ # cat /etc/mdadm.conf
DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s]
ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4 UUID=2006d8c6:71918820:247e00b0:460d5bc1

 	 c-l is 10 devices (one is dead 'e' leaves 9) .
 	 n-w is 10 devices
 	 yz  is  2 devices
 	aa-h is  8 devices
 	aj-s is 10 devices
 		----------
 		40 devices given in mdadm.conf
 		-1 dead device .
 		----------
 		39 devices
 		36 devices used (per /proc/mdstat)
 		----------
 		 3 devices for spares .

>> # cat /proc/mdstat
>> ...snip...
>> md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32]
>> sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25]
>> sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17]
>> sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8]
>> sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>>        1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
>> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]

/dev/md_d0:
         Version : 01.02.01
   Creation Time : Sun Aug 28 17:46:59 2005
      Raid Level : raid5
      Array Size : 1244826240 (1187.16 GiB 1274.70 GB)
     Device Size : 35566464 (33.92 GiB 36.42 GB)
    Raid Devices : 36
   Total Devices : 36
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Thu Sep  8 06:26:10 2005
           State : clean, degraded
  Active Devices : 35
Working Devices : 35
  Failed Devices : 1
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 64K

            Name :
            UUID : 2006d8c6:71918820:247e00b0:460d5bc1
          Events : 5308

     Number   Major   Minor   RaidDevice State
        0       8       32        0      active sync   /dev/sdc
        1       8       48        1      active sync   /dev/sdd
        0       0        0        0      removed
        3       8       80        3      active sync   /dev/sdf
        4       8       96        4      active sync   /dev/sdg
        5       8      112        5      active sync   /dev/sdh
        6       8      128        6      active sync   /dev/sdi
        7       8      144        7      active sync   /dev/sdj
        8       8      160        8      active sync   /dev/sdk
        9       8      176        9      active sync   /dev/sdl
       10       8      208       10      active sync   /dev/sdn
       11       8      224       11      active sync   /dev/sdo
       12       8      240       12      active sync   /dev/sdp
       13      65        0       13      active sync   /dev/sdq
       14      65       16       14      active sync   /dev/sdr
       15      65       32       15      active sync   /dev/sds
       16      65       48       16      active sync   /dev/sdt
       17      65       64       17      active sync   /dev/sdu
       18      65       80       18      active sync   /dev/sdv
       19      65       96       19      active sync   /dev/sdw
       20      65      128       20      active sync   /dev/sdy
       21      65      144       21      active sync   /dev/sdz
       22      65      160       22      active sync   /dev/sdaa
       23      65      176       23      active sync   /dev/sdab
       24      65      192       24      active sync   /dev/sdac
       25      65      208       25      active sync   /dev/sdad
       26      65      224       26      active sync   /dev/sdae
       27      65      240       27      active sync   /dev/sdaf
       28      66        0       28      active sync   /dev/sdag
       29      66       16       29      active sync   /dev/sdah
       30      66       48       30      active sync   /dev/sdaj
       31      66       64       31      active sync   /dev/sdak
       32      66       80       32      active sync   /dev/sdal
       33      66       96       33      active sync   /dev/sdam
       34      66      112       34      active sync   /dev/sdan
       40      66      128       35      active sync   /dev/sdao

        2       8       64        -      faulty spare   /dev/sde


-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-08 21:39     ` Mr. James W. Laferriere
@ 2005-09-09  0:50       ` Neil Brown
  2005-09-09  2:05         ` Mr. James W. Laferriere
  0 siblings, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-09  0:50 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

On Thursday September 8, babydr@baby-dragons.com wrote:
> 
>  	When I try to do the remove I get .
> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
> mdadm: hot remove failed for /dev/sdao: Device or resource busy
> 
>  	I should also have 3 other drives that are spares .  I could
>  	try hot remove on one of them .  See at bottom the output of
>  	mdadm --misc -Q --detail /dev/md_d0
>  	Which is showing no spare drives ?  And I built it with 4
>  	spares

Yes... /dev/sda[pqrs] are missing.  I wonder why..

What does
   mdadm -E /dev/sda[pqrs]
show?
What happens if you then
  mdadm /dev/md_d0 -a /dev/sda[pqrs]
??

NeilBrown

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09  0:50       ` Neil Brown
@ 2005-09-09  2:05         ` Mr. James W. Laferriere
  2005-09-09  2:15           ` Mr. James W. Laferriere
  2005-09-09  7:40           ` Neil Brown
  0 siblings, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09  2:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist

 	Hello Neil ,

On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>>  	When I try to do the remove I get .
>> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
>> mdadm: hot remove failed for /dev/sdao: Device or resource busy
>>
>>  	I should also have 3 other drives that are spares .  I could
>>  	try hot remove on one of them .  See at bottom the output of
>>  	mdadm --misc -Q --detail /dev/md_d0
>>  	Which is showing no spare drives ?  And I built it with 4
>>  	spares
>
> Yes... /dev/sda[pqrs] are missing.  I wonder why..
>
> What does
>   mdadm -E /dev/sda[pqrs]
> show?
 	See way below .

> What happens if you then
>  mdadm /dev/md_d0 -a /dev/sda[pqrs]
> ??

 	Getting stranger & stranger .

root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
mdadm: re-added /dev/sdap

root@devel-0:~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33] 
sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] 
sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] 
sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] 
sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
       1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]

md1 : active raid1 sdb2[0] sda2[1]
       1003968 blocks [2/2] [UU]

md2 : active raid1 sdb3[0] sda3[1]
       34700288 blocks [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
       136448 blocks [2/2] [UU]

unused devices: <none>


 	It appears they think their still part of the array .

root@devel-0:~ # mdadm -E /dev/sda[pqrs]
/dev/sdap:
           Magic : a92b4efc
         Version : 01.00
      Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
            Name :
   Creation Time : Sun Aug 28 17:46:59 2005
      Raid Level : raid5
    Raid Devices : 36

     Device Size : 71132943 (33.92 GiB 36.42 GB)
     Data Offset : 16 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : c083f71d:ce15a0aa:24341675:45ec6e3e
     Update Time : Sun Aug 28 20:43:06 2005
        Checksum : dc216e5 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 64K

    Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdaq:
           Magic : a92b4efc
         Version : 01.00
      Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
            Name :
   Creation Time : Sun Aug 28 17:46:59 2005
      Raid Level : raid5
    Raid Devices : 36

     Device Size : 71132943 (33.92 GiB 36.42 GB)
     Data Offset : 16 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 430b9730:4416eb44:2f793e78:a3a92cc1
     Update Time : Sun Aug 28 20:43:06 2005
        Checksum : 4092a148 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 64K

    Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdar:
           Magic : a92b4efc
         Version : 01.00
      Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
            Name :
   Creation Time : Sun Aug 28 17:46:59 2005
      Raid Level : raid5
    Raid Devices : 36

     Device Size : 71132943 (33.92 GiB 36.42 GB)
     Data Offset : 16 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 33ea7f64:976740bb:ff88e4bc:84534774
     Update Time : Sun Aug 28 20:43:06 2005
        Checksum : e2918b3d - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 64K

    Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdas:
           Magic : a92b4efc
         Version : 01.00
      Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
            Name :
   Creation Time : Sun Aug 28 17:46:59 2005
      Raid Level : raid5
    Raid Devices : 36

     Device Size : 71132943 (33.92 GiB 36.42 GB)
     Data Offset : 16 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : acb2ea9d:7c3f3b6e:98d9f85c:c8cb2bae
     Update Time : Sun Aug 28 20:43:06 2005
        Checksum : a8eff479 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 64K

    Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
root@devel-0:~ #

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09  2:05         ` Mr. James W. Laferriere
@ 2005-09-09  2:15           ` Mr. James W. Laferriere
  2005-09-09  7:40           ` Neil Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09  2:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist

 	Hello Neil ,
On Thu, 8 Sep 2005, Mr. James W. Laferriere wrote:
> On Fri, 9 Sep 2005, Neil Brown wrote:
>> On Thursday September 8, babydr@baby-dragons.com wrote:
>>>  	When I try to do the remove I get .
>>> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
>>> mdadm: hot remove failed for /dev/sdao: Device or resource busy
>>>
>>>  	I should also have 3 other drives that are spares .  I could
>>>  	try hot remove on one of them .  See at bottom the output of
>>>  	mdadm --misc -Q --detail /dev/md_d0
>>>  	Which is showing no spare drives ?  And I built it with 4
>>>  	spares
>> 
>> Yes... /dev/sda[pqrs] are missing.  I wonder why..
>> 
>> What does
>>   mdadm -E /dev/sda[pqrs]
>> show?
> 	See way below .
>
>> What happens if you then
>>  mdadm /dev/md_d0 -a /dev/sda[pqrs]
>> ??
>
> 	Getting stranger & stranger .
>
> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
> mdadm: re-added /dev/sdap

 	Is there any debugging ouptions I can enable from the
 	boot: prompt or compile in ?

 	just some more info ...  Hth ,  JimL

# dmesg | tail -n 43
RAID5 conf printout:
  --- rd:36 wd:35 fd:1
  disk 0, o:1, dev:sdc
  disk 1, o:1, dev:sdd
  disk 3, o:1, dev:sdf
  disk 4, o:1, dev:sdg
  disk 5, o:1, dev:sdh
  disk 6, o:1, dev:sdi
  disk 7, o:1, dev:sdj
  disk 8, o:1, dev:sdk
  disk 9, o:1, dev:sdl
  disk 10, o:1, dev:sdn
  disk 11, o:1, dev:sdo
  disk 12, o:1, dev:sdp
  disk 13, o:1, dev:sdq
  disk 14, o:1, dev:sdr
  disk 15, o:1, dev:sds
  disk 16, o:1, dev:sdt
  disk 17, o:1, dev:sdu
  disk 18, o:1, dev:sdv
  disk 19, o:1, dev:sdw
  disk 20, o:1, dev:sdy
  disk 21, o:1, dev:sdz
  disk 22, o:1, dev:sdaa
  disk 23, o:1, dev:sdab
  disk 24, o:1, dev:sdac
  disk 25, o:1, dev:sdad
  disk 26, o:1, dev:sdae
  disk 27, o:1, dev:sdaf
  disk 28, o:1, dev:sdag
  disk 29, o:1, dev:sdah
  disk 30, o:1, dev:sdaj
  disk 31, o:1, dev:sdak
  disk 32, o:1, dev:sdal
  disk 33, o:1, dev:sdam
  disk 34, o:1, dev:sdan
  disk 35, o:1, dev:sdao
md: cannot remove active disk sdao from md_d0 ...
md: cannot remove active disk sdao from md_d0 ...
md: bind<sdap>
md: bind<sdaq>
md: bind<sdar>
md: bind<sdas>

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09  2:05         ` Mr. James W. Laferriere
  2005-09-09  2:15           ` Mr. James W. Laferriere
@ 2005-09-09  7:40           ` Neil Brown
  2005-09-09 11:37             ` David M. Strang
  2005-09-09 20:07             ` Mr. James W. Laferriere
  1 sibling, 2 replies; 18+ messages in thread
From: Neil Brown @ 2005-09-09  7:40 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 1300 bytes --]

On Thursday September 8, babydr@baby-dragons.com wrote:
> > What happens if you then
> >  mdadm /dev/md_d0 -a /dev/sda[pqrs]
> > ??
> 
>  	Getting stranger & stranger .
> 
> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
> mdadm: re-added /dev/sdap
> 

Hmm.. mdadm bug.

> root@devel-0:~ # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
> md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33] 
> sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] 
> sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] 
> sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] 
> sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>        1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]

Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
kernel.


Attached are three patches.
The first two are needed by 2.6.12.5 to make sure resync happens (this
is particularly a problem for version-1 superblocks) or just upgrade
to 2.6.13.

The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it
actually adds all of them, and so that when you --assemble a version-1
array with spares, the spares actually get included.

NeilBrown



[-- Attachment #2: 349MdHotAddFix --]
[-- Type: text/plain, Size: 786 bytes --]

Status: ok

Make sure recovery happens when add_new_disk is used for hot_add

Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    2 ++
 1 files changed, 2 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c	2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
 		err = bind_rdev_to_array(rdev, mddev);
 		if (err)
 			export_rdev(rdev);
+
+		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 		if (mddev->thread)
 			md_wakeup_thread(mddev->thread);
 		return err;

[-- Attachment #3: 418MdWakeThread --]
[-- Type: text/plain, Size: 1420 bytes --]

Status: ok

Make sure resync gets started when array starts.

We weren't actually waking up the md thread after setting
MD_RECOVERY_NEEDED when assembling an array, so it is possible to
lose a race and not actually start resync.

So add a call to md_wakeup_thread, and while we are at it, remove
all the "if (mddev->thread)" guards as md_wake_thread does its own
checking.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c	2005-08-26 17:00:30.000000000 +1000
+++ ./drivers/md/md.c~current~	2005-08-26 17:00:39.000000000 +1000
@@ -256,8 +256,7 @@ static inline void mddev_unlock(mddev_t 
 {
 	up(&mddev->reconfig_sem);
 
-	if (mddev->thread)
-		md_wakeup_thread(mddev->thread);
+	md_wakeup_thread(mddev->thread);
 }
 
 mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
@@ -1726,6 +1725,7 @@ static int do_md_run(mddev_t * mddev)
 	mddev->in_sync = 1;
 	
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	md_wakeup_thread(mddev->thread);
 	
 	if (mddev->sb_dirty)
 		md_update_sb(mddev);
@@ -2255,8 +2255,7 @@ static int add_new_disk(mddev_t * mddev,
 			export_rdev(rdev);
 
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-		if (mddev->thread)
-			md_wakeup_thread(mddev->thread);
+		md_wakeup_thread(mddev->thread);
 		return err;
 	}
 

[-- Attachment #4: patch --]
[-- Type: text/plain, Size: 1128 bytes --]


diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~	2005-09-05 10:55:01.000000000 +1000
+++ ./Assemble.c	2005-09-09 16:24:50.000000000 +1000
@@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char 
 	struct mdinfo info;
 	struct mddev_ident_s ident2;
 	char *avail;
+	int nextspare = 0;
 	
 	vers = md_get_version(mdfd);
 	if (vers <= 0) {
@@ -320,6 +321,11 @@ int Assemble(struct supertype *st, char 
 			i = devcnt;
 		else
 			i = devices[devcnt].raid_disk;
+		if (i+1 == 0) {
+			if (nextspare < info.array.raid_disks)
+				nextspare = info.array.raid_disks;
+			i = nextspare++;
+		}
 		if (i < 10000) {
 			if (i >= bestcnt) {
 				unsigned int newbestcnt = i+10;

diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~	2005-09-05 10:54:55.000000000 +1000
+++ ./Manage.c	2005-09-09 16:04:12.000000000 +1000
@@ -288,7 +288,7 @@ int Manage_subdevs(char *devname, int fd
 						if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) {
 							if (verbose >= 0)
 								fprintf(stderr, Name ": re-added %s\n", dv->devname);
-							return 0;
+							continue;
 						}
 						/* fall back on normal-add */
 					}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09  7:40           ` Neil Brown
@ 2005-09-09 11:37             ` David M. Strang
  2005-09-09 13:52               ` Mr. James W. Laferriere
  2005-09-09 20:07             ` Mr. James W. Laferriere
  1 sibling, 1 reply; 18+ messages in thread
From: David M. Strang @ 2005-09-09 11:37 UTC (permalink / raw)
  To: Neil Brown, Mr. James W. Laferriere; +Cc: linux-raid maillist

NeilBrown wrote:
> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
> kernel.

I can attest to this; as I workaround I've been using:

mdadm --readonly /dev/mdX
mdadm --readwrite /dev/mdX

That will trigger a rebuild.


-- David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09 11:37             ` David M. Strang
@ 2005-09-09 13:52               ` Mr. James W. Laferriere
  2005-09-09 13:59                 ` David M. Strang
  0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 13:52 UTC (permalink / raw)
  To: David M. Strang; +Cc: Neil Brown, linux-raid maillist

 	Hello David ,  Thank you for the idea .  But ...

root@devel-0:~ # mdadm --readonly /dev/md_d0
mdadm: failed to set readonly for /dev/md_d0: Device or resource busy

 	I think I'll try Neil's upgrade to 2.6.13 & his patch to
 	mdadm .  I'll report back if that cures my problem .
 		Tnx to all ,  JimL

On Fri, 9 Sep 2005, David M. Strang wrote:
> NeilBrown wrote:
>> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
>> kernel.
>
> I can attest to this; as I workaround I've been using:
>
> mdadm --readonly /dev/mdX
> mdadm --readwrite /dev/mdX
>
> That will trigger a rebuild.
>
>
> -- David
>

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09 13:52               ` Mr. James W. Laferriere
@ 2005-09-09 13:59                 ` David M. Strang
  2005-09-09 19:59                   ` Mr. James W. Laferriere
  0 siblings, 1 reply; 18+ messages in thread
From: David M. Strang @ 2005-09-09 13:59 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist

Mr. James W. Laferriere wrote:
> Hello David ,  Thank you for the idea .  But ...
>
> root@devel-0:~ # mdadm --readonly /dev/md_d0
> mdadm: failed to set readonly for /dev/md_d0: Device or resource busy

James --

umount /dev/md_d0 first; you can remount it right after you re-enable 
writes.

That should do the trick =)

-- David 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09 13:59                 ` David M. Strang
@ 2005-09-09 19:59                   ` Mr. James W. Laferriere
  0 siblings, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 19:59 UTC (permalink / raw)
  To: David M. Strang; +Cc: Neil Brown, linux-raid maillist

 	Hello David ,  That did work .  Thank you again .  JimL
 	umount /directory
 	mdadm --readonly /dev/mdX
 	mdadm --readwrite /dev/mdX
 	mount /directory

On Fri, 9 Sep 2005, David M. Strang wrote:
> Mr. James W. Laferriere wrote:
>> Hello David ,  Thank you for the idea .  But ...
>> 
>> root@devel-0:~ # mdadm --readonly /dev/md_d0
>> mdadm: failed to set readonly for /dev/md_d0: Device or resource busy
>
> James --
>
> umount /dev/md_d0 first; you can remount it right after you re-enable writes.
>
> That should do the trick =)
>
> -- David

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09  7:40           ` Neil Brown
  2005-09-09 11:37             ` David M. Strang
@ 2005-09-09 20:07             ` Mr. James W. Laferriere
  2005-09-09 20:58               ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
  2005-09-09 21:49               ` Drive fails & raid6 array is not self rebuild Neil Brown
  1 sibling, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 20:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist


 	Hello Neil ,  I patched all were successful .  But after a
 	make clean ; make
 	I get ...	Tia ,  JimL
..snip...
gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
Assemble.c: In function `Assemble':
Assemble.c:323: error: `nextspare' undeclared (first use in this function)
Assemble.c:323: error: (Each undeclared identifier is reported only once
Assemble.c:323: error: for each function it appears in.)
make: *** [Assemble.o] Error 1


On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>>> What happens if you then
>>>  mdadm /dev/md_d0 -a /dev/sda[pqrs]
>>> ??
>>  	Getting stranger & stranger .
>>
>> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
>> mdadm: re-added /dev/sdap
> Hmm.. mdadm bug.
>
>> root@devel-0:~ # cat /proc/mdstat
>> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
>> md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33]
>> sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26]
>> sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18]
>> sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9]
>> sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>>        1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
>> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
>
> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
> kernel.
> Attached are three patches.
> The first two are needed by 2.6.12.5 to make sure resync happens (this
> is particularly a problem for version-1 superblocks) or just upgrade
> to 2.6.13.
> The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it
> actually adds all of them, and so that when you --assemble a version-1
> array with spares, the spares actually get included.
> NeilBrown

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* OT: lilo overwriting partition info ?
  2005-09-09 20:07             ` Mr. James W. Laferriere
@ 2005-09-09 20:58               ` Mr. James W. Laferriere
  2005-09-09 21:49               ` Drive fails & raid6 array is not self rebuild Neil Brown
  1 sibling, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 20:58 UTC (permalink / raw)
  To: linux-raid maillist

 	Hello All ,  Off topic I know ...
         I have a question not related to MD .  Have you heard of
         complaints about lilo overwriting partition info on disks
         after the first 2 if those are in an raid1 ?  Or any mentions
         of lilo writing to all 16 disks causing problems ?
 		Tia ,  JimL
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09 20:07             ` Mr. James W. Laferriere
  2005-09-09 20:58               ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
@ 2005-09-09 21:49               ` Neil Brown
  2005-09-10  0:54                 ` Mr. James W. Laferriere
  1 sibling, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-09 21:49 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

On Friday September 9, babydr@baby-dragons.com wrote:
> 
>  	Hello Neil ,  I patched all were successful .  But after a
>  	make clean ; make
>  	I get ...	Tia ,  JimL
> ..snip...
> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
> Assemble.c: In function `Assemble':
> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
> Assemble.c:323: error: (Each undeclared identifier is reported only once
> Assemble.c:323: error: for each function it appears in.)
> make: *** [Assemble.o] Error 1

That's odd, as the patch contained:

--- ./Assemble.c~current~	2005-09-05 10:55:01.000000000 +1000
+++ ./Assemble.c	2005-09-09 16:24:50.000000000 +1000
@@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char 
 	struct mdinfo info;
 	struct mddev_ident_s ident2;
 	char *avail;
+	int nextspare = 0;
 	
 	vers = md_get_version(mdfd);
 	if (vers <= 0) {


Maybe add that bit in by hand??

NeilBrown

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-09 21:49               ` Drive fails & raid6 array is not self rebuild Neil Brown
@ 2005-09-10  0:54                 ` Mr. James W. Laferriere
  2005-09-10 21:58                   ` Neil Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-10  0:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist

 	Hello Neil ,

On Sat, 10 Sep 2005, Neil Brown wrote:
> On Friday September 9, babydr@baby-dragons.com wrote:
>>
>>  	Hello Neil ,  I patched all were successful .  But after a
>>  	make clean ; make
>>  	I get ...	Tia ,  JimL
>> ..snip...
>> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
>> Assemble.c: In function `Assemble':
>> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
>> Assemble.c:323: error: (Each undeclared identifier is reported only once
>> Assemble.c:323: error: for each function it appears in.)
>> make: *** [Assemble.o] Error 1
>
> That's odd, as the patch contained:
>
> --- ./Assemble.c~current~	2005-09-05 10:55:01.000000000 +1000
> +++ ./Assemble.c	2005-09-09 16:24:50.000000000 +1000
> @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
> 	struct mdinfo info;
> 	struct mddev_ident_s ident2;
> 	char *avail;
> +	int nextspare = 0;
>
> 	vers = md_get_version(mdfd);
> 	if (vers <= 0) {

 	What was missing from my 2.0 sources was the 'char *avail;'
 	and patching failed on that hunk ,  Which totally missed .
 	So I hand entered as you suggested the above bits .

 	Now it failes on a Warning (???) .
 	Never heard of failures on warnings before .

gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
Assemble.c: In function `Assemble':
Assemble.c:121: warning: unused variable `avail'
make: *** [Assemble.o] Error 1

 	Would you please cut a source set to the kernel site >
 	Say as version 2.0a so I can see the diffs against the
 	sources I have ?  Tia ,  JimL

-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Drive fails & raid6 array is not self rebuild .
  2005-09-10  0:54                 ` Mr. James W. Laferriere
@ 2005-09-10 21:58                   ` Neil Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Neil Brown @ 2005-09-10 21:58 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 2287 bytes --]

On Friday September 9, babydr@baby-dragons.com wrote:
>  	Hello Neil ,
> 
> On Sat, 10 Sep 2005, Neil Brown wrote:
> > On Friday September 9, babydr@baby-dragons.com wrote:
> >>
> >>  	Hello Neil ,  I patched all were successful .  But after a
> >>  	make clean ; make
> >>  	I get ...	Tia ,  JimL
> >> ..snip...
> >> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
> >> Assemble.c: In function `Assemble':
> >> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
> >> Assemble.c:323: error: (Each undeclared identifier is reported only once
> >> Assemble.c:323: error: for each function it appears in.)
> >> make: *** [Assemble.o] Error 1
> >
> > That's odd, as the patch contained:
> >
> > --- ./Assemble.c~current~	2005-09-05 10:55:01.000000000 +1000
> > +++ ./Assemble.c	2005-09-09 16:24:50.000000000 +1000
> > @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
> > 	struct mdinfo info;
> > 	struct mddev_ident_s ident2;
> > 	char *avail;
> > +	int nextspare = 0;
> >
> > 	vers = md_get_version(mdfd);
> > 	if (vers <= 0) {
> 
>  	What was missing from my 2.0 sources was the 'char *avail;'
>  	and patching failed on that hunk ,  Which totally missed .

The 'avail' is for a different independent patch which fixes a raid10
issue.
You can ignore it.

>  	So I hand entered as you suggested the above bits .
> 
>  	Now it failes on a Warning (???) .

I guess you didn't ignore it. 

Just add the 'int next_spare = 0;' to what you had.  Don't worry that
the 'char *avail;' isn't there.

>  	Never heard of failures on warnings before .

That would be because of the '-Werror' I put in there to make sure I
don't get lazy about warnings.

> 
> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\"   -c -o Assemble.o Assemble.c
> Assemble.c: In function `Assemble':
> Assemble.c:121: warning: unused variable `avail'
> make: *** [Assemble.o] Error 1
> 
>  	Would you please cut a source set to the kernel site >
>  	Say as version 2.0a so I can see the diffs against the
>  	sources I have ?  Tia ,  JimL

I hope to do a 2.1 next week.  Here is the current complete patch
against 2.0.
NeilBrown



[-- Attachment #2: mdadm.diff --]
[-- Type: application/octet-stream, Size: 6338 bytes --]

diff -ru /var/tmp/mdadm-old/mdadm-2.0/Assemble.c /var/tmp/mdadm-new/mdadm-2.0/Assemble.c
--- /var/tmp/mdadm-old/mdadm-2.0/Assemble.c	2005-08-15 16:31:57.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/Assemble.c	2005-09-09 16:24:50.000000000 +1000
@@ -118,6 +118,8 @@
 	mddev_dev_t tmpdev;
 	struct mdinfo info;
 	struct mddev_ident_s ident2;
+	char *avail;
+	int nextspare = 0;
 	
 	vers = md_get_version(mdfd);
 	if (vers <= 0) {
@@ -319,6 +321,11 @@
 			i = devcnt;
 		else
 			i = devices[devcnt].raid_disk;
+		if (i+1 == 0) {
+			if (nextspare < info.array.raid_disks)
+				nextspare = info.array.raid_disks;
+			i = nextspare++;
+		}
 		if (i < 10000) {
 			if (i >= bestcnt) {
 				unsigned int newbestcnt = i+10;
@@ -359,6 +366,8 @@
 	/* now we have some devices that might be suitable.
 	 * I wonder how many
 	 */
+	avail = malloc(info.array.raid_disks);
+	memset(avail, 0, info.array.raid_disks);
 	okcnt = 0;
 	sparecnt=0;
 	for (i=0; i< bestcnt ;i++) {
@@ -377,13 +386,16 @@
 		if (devices[j].events+event_margin >=
 		    devices[most_recent].events) {
 			devices[j].uptodate = 1;
-			if (i < info.array.raid_disks)
+			if (i < info.array.raid_disks) {
 				okcnt++;
-			else
+				avail[i]=1;
+			} else
 				sparecnt++;
 		}
 	}
-	while (force && !enough(info.array.level, info.array.raid_disks, okcnt)) {
+	while (force && !enough(info.array.level, info.array.raid_disks,
+				info.array.layout,
+				avail, okcnt)) {
 		/* Choose the newest best drive which is
 		 * not up-to-date, update the superblock
 		 * and add it.
@@ -434,6 +446,7 @@
 		close(fd);
 		devices[chosen_drive].events = devices[most_recent].events;
 		devices[chosen_drive].uptodate = 1;
+		avail[chosen_drive] = 1;
 		okcnt++;
 		free(super);
 	}
@@ -599,7 +612,7 @@
 		
 		if (runstop == 1 ||
 		    (runstop == 0 && 
-		     ( enough(info.array.level, info.array.raid_disks, okcnt) &&
+		     ( enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt) &&
 		       (okcnt >= req_cnt || start_partial_ok)
 			     ))) {
 			if (ioctl(mdfd, RUN_ARRAY, NULL)==0) {
@@ -627,7 +640,7 @@
 			fprintf(stderr, Name ": %s assembled from %d drive%s", mddev, okcnt, okcnt==1?"":"s");
 			if (sparecnt)
 				fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s");
-			if (!enough(info.array.level, info.array.raid_disks, okcnt))
+			if (!enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt))
 				fprintf(stderr, " - not enough to start the array.\n");
 			else {
 				if (req_cnt == info.array.raid_disks)
diff -ru /var/tmp/mdadm-old/mdadm-2.0/Manage.c /var/tmp/mdadm-new/mdadm-2.0/Manage.c
--- /var/tmp/mdadm-old/mdadm-2.0/Manage.c	2005-08-26 14:49:25.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/Manage.c	2005-09-09 16:04:12.000000000 +1000
@@ -288,7 +288,7 @@
 						if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) {
 							if (verbose >= 0)
 								fprintf(stderr, Name ": re-added %s\n", dv->devname);
-							return 0;
+							continue;
 						}
 						/* fall back on normal-add */
 					}
diff -ru /var/tmp/mdadm-old/mdadm-2.0/mdadm.h /var/tmp/mdadm-new/mdadm-2.0/mdadm.h
--- /var/tmp/mdadm-old/mdadm-2.0/mdadm.h	2005-08-26 14:49:24.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/mdadm.h	2005-09-05 10:55:01.000000000 +1000
@@ -291,7 +291,8 @@
 extern int same_uuid(int a[4], int b[4], int swapuuid);
 /* extern int compare_super(mdp_super_t *first, mdp_super_t *second);*/
 extern unsigned long calc_csum(void *super, int bytes);
-extern int enough(int level, int raid_disks, int avail_disks);
+extern int enough(int level, int raid_disks, int layout,
+		   char *avail, int avail_disks);
 extern int ask(char *mesg);
 
 
diff -ru /var/tmp/mdadm-old/mdadm-2.0/super0.c /var/tmp/mdadm-new/mdadm-2.0/super0.c
--- /var/tmp/mdadm-old/mdadm-2.0/super0.c	2005-08-26 14:49:24.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/super0.c	2005-09-05 10:55:01.000000000 +1000
@@ -131,6 +131,10 @@
 		c = map_num(r5layout, sb->layout);
 		printf("         Layout : %s\n", c?c:"-unknown-");
 	}
+	if (sb->level == 10) {
+		printf("         Layout : near=%d, far=%d\n",
+		       sb->layout&255, (sb->layout>>8)&255);
+	}
 	switch(sb->level) {
 	case 0:
 	case 4:
@@ -234,6 +238,7 @@
 	info->array.patch_version = sb->patch_version;
 	info->array.raid_disks = sb->raid_disks;
 	info->array.level = sb->level;
+	info->array.layout = sb->layout;
 	info->array.md_minor = sb->md_minor;
 	info->array.ctime = sb->ctime;
 
diff -ru /var/tmp/mdadm-old/mdadm-2.0/super1.c /var/tmp/mdadm-new/mdadm-2.0/super1.c
--- /var/tmp/mdadm-old/mdadm-2.0/super1.c	2005-08-26 16:07:33.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/super1.c	2005-09-05 10:55:01.000000000 +1000
@@ -180,6 +180,11 @@
 		c = map_num(r5layout, __le32_to_cpu(sb->layout));
 		printf("         Layout : %s\n", c?c:"-unknown-");
 	}
+	if (__le32_to_cpu(sb->level) == 10) {
+		int lo = __le32_to_cpu(sb->layout);
+		printf("         Layout : near=%d, far=%d\n",
+		       lo&255, (lo>>8)&255);
+	}
 	switch(__le32_to_cpu(sb->level)) {
 	case 0:
 	case 4:
@@ -290,6 +295,7 @@
 	info->array.patch_version = 0;
 	info->array.raid_disks = __le32_to_cpu(sb->raid_disks);
 	info->array.level = __le32_to_cpu(sb->level);
+	info->array.layout = __le32_to_cpu(sb->layout);
 	info->array.md_minor = -1;
 	info->array.ctime = __le64_to_cpu(sb->ctime);
 
diff -ru /var/tmp/mdadm-old/mdadm-2.0/util.c /var/tmp/mdadm-new/mdadm-2.0/util.c
--- /var/tmp/mdadm-old/mdadm-2.0/util.c	2005-08-17 14:28:38.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/util.c	2005-09-05 10:55:01.000000000 +1000
@@ -118,10 +118,31 @@
 	return (a*1000000)+(b*1000)+c;
 }
 
-int enough(int level, int raid_disks, int avail_disks)
+int enough(int level, int raid_disks, int layout,
+	   char *avail, int avail_disks)
 {
+	int copies, first;
 	switch (level) {
-	case 10: return 1; /* a lie, but it is hard to tell */
+	case 10:
+		/* This is the tricky one - we need to check
+		 * which actual disks are present.
+		 */
+		copies = (layout&255)* (layout>>8);
+		first=0;
+		do {
+			/* there must be one of the 'copies' form 'first' */
+			int n = copies;
+			int cnt=0;
+			while (n--) {
+				if (avail[first])
+					cnt++;
+				first = (first+1) % raid_disks;
+			}
+			if (cnt == 0)
+				return 0;
+
+		} while (first != 0);
+		return 1;
 
 	case -4:
 		return avail_disks>= 1;

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-09-10 21:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker
2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
2005-09-08 19:34   ` Molle Bestefich
2005-09-08 21:09   ` Neil Brown
2005-09-08 21:39     ` Mr. James W. Laferriere
2005-09-09  0:50       ` Neil Brown
2005-09-09  2:05         ` Mr. James W. Laferriere
2005-09-09  2:15           ` Mr. James W. Laferriere
2005-09-09  7:40           ` Neil Brown
2005-09-09 11:37             ` David M. Strang
2005-09-09 13:52               ` Mr. James W. Laferriere
2005-09-09 13:59                 ` David M. Strang
2005-09-09 19:59                   ` Mr. James W. Laferriere
2005-09-09 20:07             ` Mr. James W. Laferriere
2005-09-09 20:58               ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
2005-09-09 21:49               ` Drive fails & raid6 array is not self rebuild Neil Brown
2005-09-10  0:54                 ` Mr. James W. Laferriere
2005-09-10 21:58                   ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.