* RE: my first raid disaster on reboot :o( update
@ 2005-09-08 11:38 Ken Walker
2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
0 siblings, 1 reply; 18+ messages in thread
From: Ken Walker @ 2005-09-08 11:38 UTC (permalink / raw)
To: linux-raid
I'm getting confused again.
I installed Debian 3.1 onto two SCSI drives set up as raid1.
I also set-up the four ide drives, during installation and set them as
/dev/md7 using /dev/hda,/dev/hdc
/dev/md8 using /dev/hab,/dev/hdd
both ext3
they started, and sync'd
on reboot, md7 and md8 didn't auto start.
so i created them again with
mdadm -C /dev/md7 -l1 -n2 /dev/hda /dev/hdc
this stared rebuilding.
then i did the same for md8
mdadm -C /dev/md8 -l1 -n2 /dev/hdb /dev/hdd
then i did
mkfs.ext3 /dev/md7
mkfs.ext3 /dev/md8
I checked with Fdisk that they were all set as FD.
then i did
(i made a copy of the original mdadm.conf first.)
mdadm --detail -- scan > mdadm.conf
And on reboot only md0 would mount.
So i copied the original mdadm.conf back and rebooted, and all the raids
apart from md7 and md8 started.
I noticed at the top of the original mdadm.conf i had the following
DEVICE partitions
so i did
mdadm --detail -- scan > mdadm.conf
again, with md7 and md8 running and rebooted.
adding
DEVICE partitions
back to the top
The system booted up but again without md7 or md8, it did its corrupt
superblock or ext2 file system complaints.
But I'm getting confused,
because, on
http://www.linuxdevcenter.com/pub/a/linux/2002/12/05/RAID.html
which is where i got the
mdadm --detail -- scan > mdadm.conf
from,
the example he gives
DEVICE /dev/sdb1 /dev/sdc1
ARRAY /dev/md0 level=raid0 num-devices=2
UUID=410a299e:4cdd535e:169d3df4:48b7144a
is the other way round in my mdadm.conf file, i have
ARRAY /dev/md0 level=raid0 num-devices=2
UUID=410a299e:4cdd535e:169d3df4:48b7144a
DEVICE /dev/sdb1 /dev/sdc1
Which way round should it be?
I have also read that a mdadm.conf file isn't really needed, but can be
helpful, if i hide me mdadm.conf file will the system boot with md7 and md8.
I do have those two raids in my fstab file at the end as
/dev/md7 /Cad100 ext3 defaults 0 2
/dev/md8 /Cad200 ext3 defaults 0 2
Can anybody help :o(
Ken
-----Original Message-----
From: Ken Walker [mailto:ken.walker@manchester.ac.uk]
Sent: 06 September 2005 2:26 pm
To: linux-raid@vger.kernel.org
Subject: my first raid disaster on reboot :o(
I've got debian 3.1, kernel 2.6 installed on a machine with two 9.1g SCSI
and 4 160g IDE's.
The SCSI is split up into / /usr /var /swap /tmp and /home, each set as
a raid1.
The IDE's are set up as raid1 on the ide channels, such that hda is mirrored
with hdc and hdb is mirrored with hdd.
I had to move the system today so powered down with shutdown -h now.
On reboot i just get / mounted ( i think ) and everything else says mdx
corrupt superblock or such and not a valid ext2 fs.
all the mirrors were set us as ext3 and when it was up and running
/proc/mdstat said all was well.
/etc/fstab has all the raids present.
I'm kinda stuck as to where to start.
Could anybody point me in the right direction please.
many thanks
Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Drive fails & raid6 array is not self rebuild .
2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker
@ 2005-09-08 18:54 ` Mr. James W. Laferriere
2005-09-08 19:34 ` Molle Bestefich
2005-09-08 21:09 ` Neil Brown
0 siblings, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-08 18:54 UTC (permalink / raw)
To: linux-raid maillist
Hello All , Is there a documented procedure to follow during
creation or after that will get a raid6 array to self
rebuild ?
Why I am asking .
I was getting the errors below at a heavy rate , so ...
Sep 7 20:11:49 localhost kernel: scsi2 (2:0): rejecting I/O to dead
device
Sep 7 20:11:49 localhost kernel: md: write_disk_sb failed for device
sde
Sep 7 20:11:49 localhost kernel: md: excessive errors occurred during
superblock update, exiting
Sep 7 20:11:49 localhost kernel: raid5: Disk failure on sde,
disabling device. Operation continuing on 35 devices
I ran the below & the above messages stopped . But the array
(appears to have) never tried rebuilding .
# mdadm --manage --fail /dev/md_d0 /dev/sde
The problem arose because the drive died totally . ie:
root@devel-0:/ # fdisk /dev/sde
Unable to open /dev/sde
# cat /proc/mdstat
...snip...
md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32]
sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25]
sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17]
sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8]
sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
[UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
...snip...
# cat /etc/mdadm.conf
DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s]
ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4
UUID=2006d8c6:71918820:247e00b0:460d5bc1
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
@ 2005-09-08 19:34 ` Molle Bestefich
2005-09-08 21:09 ` Neil Brown
1 sibling, 0 replies; 18+ messages in thread
From: Molle Bestefich @ 2005-09-08 19:34 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
Mr. James W. Laferriere wrote:
> Is there a documented procedure to follow during
> creation or after that will get a raid6 array to self
> rebuild ?
MD will rebuild your array automatically, given that it has a spare disk to use.
> raid5: Disk failure on sde, disabling device. Operation continuing on 35 devices
Seems like a raid5, not raid6..
> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
No need to do any rebuilding on the remaining devices, since the data
on them are fine.
You've lost redundancy however, so you should add a new disk to the array ASAP.
With 35 disks, I'd recommend that you at least use raid6 in place of raid5..
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
2005-09-08 19:34 ` Molle Bestefich
@ 2005-09-08 21:09 ` Neil Brown
2005-09-08 21:39 ` Mr. James W. Laferriere
1 sibling, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-08 21:09 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
On Thursday September 8, babydr@baby-dragons.com wrote:
> Hello All , Is there a documented procedure to follow during
> creation or after that will get a raid6 array to self
> rebuild ?
I suspect a kernel upgrade would do the trick, though you don't say
what kernel you are running.
You could probably kick it along by removing and re-adding your spare:
mdadm /dev/md_d0 --remove /dev/sdao
mdadm /dev/md_d0 --add /dev/sdao
(And I assume you mean 'raid5' rather than 'raid6', not that it
matters..)
NeilBrown
> # cat /proc/mdstat
> ...snip...
> md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32]
> sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25]
> sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17]
> sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8]
> sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-08 21:09 ` Neil Brown
@ 2005-09-08 21:39 ` Mr. James W. Laferriere
2005-09-09 0:50 ` Neil Brown
0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-08 21:39 UTC (permalink / raw)
To: linux-raid maillist
Hello Neil , Inline .
On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>> Hello All , Is there a documented procedure to follow during
>> creation or after that will get a raid6 array to self
>> rebuild ?
> I suspect a kernel upgrade would do the trick, though you don't say
> what kernel you are running.
> You could probably kick it along by removing and re-adding your spare:
> mdadm /dev/md_d0 --remove /dev/sdao
> mdadm /dev/md_d0 --add /dev/sdao
>
> (And I assume you mean 'raid5' rather than 'raid6', not that it
> matters..)
Sorry , yes I meant raid5 .
My kernel version is .
root@devel-0:/ # uname -a
Linux devel-0 2.6.12.5 #1 SMP Fri Aug 26 20:09:46 UTC 2005 i686 GNU/Linux
When I try to do the remove I get .
root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
mdadm: hot remove failed for /dev/sdao: Device or resource busy
I should also have 3 other drives that are spares . I could
try hot remove on one of them . See at bottom the output of
mdadm --misc -Q --detail /dev/md_d0
Which is showing no spare drives ? And I built it with 4
spares
root@devel-0:~ # cat /etc/mdadm.conf
DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s]
ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4 UUID=2006d8c6:71918820:247e00b0:460d5bc1
c-l is 10 devices (one is dead 'e' leaves 9) .
n-w is 10 devices
yz is 2 devices
aa-h is 8 devices
aj-s is 10 devices
----------
40 devices given in mdadm.conf
-1 dead device .
----------
39 devices
36 devices used (per /proc/mdstat)
----------
3 devices for spares .
>> # cat /proc/mdstat
>> ...snip...
>> md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32]
>> sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25]
>> sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17]
>> sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8]
>> sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
>> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
/dev/md_d0:
Version : 01.02.01
Creation Time : Sun Aug 28 17:46:59 2005
Raid Level : raid5
Array Size : 1244826240 (1187.16 GiB 1274.70 GB)
Device Size : 35566464 (33.92 GiB 36.42 GB)
Raid Devices : 36
Total Devices : 36
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Sep 8 06:26:10 2005
State : clean, degraded
Active Devices : 35
Working Devices : 35
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name :
UUID : 2006d8c6:71918820:247e00b0:460d5bc1
Events : 5308
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd
0 0 0 0 removed
3 8 80 3 active sync /dev/sdf
4 8 96 4 active sync /dev/sdg
5 8 112 5 active sync /dev/sdh
6 8 128 6 active sync /dev/sdi
7 8 144 7 active sync /dev/sdj
8 8 160 8 active sync /dev/sdk
9 8 176 9 active sync /dev/sdl
10 8 208 10 active sync /dev/sdn
11 8 224 11 active sync /dev/sdo
12 8 240 12 active sync /dev/sdp
13 65 0 13 active sync /dev/sdq
14 65 16 14 active sync /dev/sdr
15 65 32 15 active sync /dev/sds
16 65 48 16 active sync /dev/sdt
17 65 64 17 active sync /dev/sdu
18 65 80 18 active sync /dev/sdv
19 65 96 19 active sync /dev/sdw
20 65 128 20 active sync /dev/sdy
21 65 144 21 active sync /dev/sdz
22 65 160 22 active sync /dev/sdaa
23 65 176 23 active sync /dev/sdab
24 65 192 24 active sync /dev/sdac
25 65 208 25 active sync /dev/sdad
26 65 224 26 active sync /dev/sdae
27 65 240 27 active sync /dev/sdaf
28 66 0 28 active sync /dev/sdag
29 66 16 29 active sync /dev/sdah
30 66 48 30 active sync /dev/sdaj
31 66 64 31 active sync /dev/sdak
32 66 80 32 active sync /dev/sdal
33 66 96 33 active sync /dev/sdam
34 66 112 34 active sync /dev/sdan
40 66 128 35 active sync /dev/sdao
2 8 64 - faulty spare /dev/sde
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-08 21:39 ` Mr. James W. Laferriere
@ 2005-09-09 0:50 ` Neil Brown
2005-09-09 2:05 ` Mr. James W. Laferriere
0 siblings, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-09 0:50 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
On Thursday September 8, babydr@baby-dragons.com wrote:
>
> When I try to do the remove I get .
> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
> mdadm: hot remove failed for /dev/sdao: Device or resource busy
>
> I should also have 3 other drives that are spares . I could
> try hot remove on one of them . See at bottom the output of
> mdadm --misc -Q --detail /dev/md_d0
> Which is showing no spare drives ? And I built it with 4
> spares
Yes... /dev/sda[pqrs] are missing. I wonder why..
What does
mdadm -E /dev/sda[pqrs]
show?
What happens if you then
mdadm /dev/md_d0 -a /dev/sda[pqrs]
??
NeilBrown
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 0:50 ` Neil Brown
@ 2005-09-09 2:05 ` Mr. James W. Laferriere
2005-09-09 2:15 ` Mr. James W. Laferriere
2005-09-09 7:40 ` Neil Brown
0 siblings, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 2:05 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid maillist
Hello Neil ,
On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>> When I try to do the remove I get .
>> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
>> mdadm: hot remove failed for /dev/sdao: Device or resource busy
>>
>> I should also have 3 other drives that are spares . I could
>> try hot remove on one of them . See at bottom the output of
>> mdadm --misc -Q --detail /dev/md_d0
>> Which is showing no spare drives ? And I built it with 4
>> spares
>
> Yes... /dev/sda[pqrs] are missing. I wonder why..
>
> What does
> mdadm -E /dev/sda[pqrs]
> show?
See way below .
> What happens if you then
> mdadm /dev/md_d0 -a /dev/sda[pqrs]
> ??
Getting stranger & stranger .
root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
mdadm: re-added /dev/sdap
root@devel-0:~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33]
sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26]
sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18]
sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9]
sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
md1 : active raid1 sdb2[0] sda2[1]
1003968 blocks [2/2] [UU]
md2 : active raid1 sdb3[0] sda3[1]
34700288 blocks [2/2] [UU]
md0 : active raid1 sdb1[0] sda1[1]
136448 blocks [2/2] [UU]
unused devices: <none>
It appears they think their still part of the array .
root@devel-0:~ # mdadm -E /dev/sda[pqrs]
/dev/sdap:
Magic : a92b4efc
Version : 01.00
Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
Name :
Creation Time : Sun Aug 28 17:46:59 2005
Raid Level : raid5
Raid Devices : 36
Device Size : 71132943 (33.92 GiB 36.42 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c083f71d:ce15a0aa:24341675:45ec6e3e
Update Time : Sun Aug 28 20:43:06 2005
Checksum : dc216e5 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 64K
Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdaq:
Magic : a92b4efc
Version : 01.00
Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
Name :
Creation Time : Sun Aug 28 17:46:59 2005
Raid Level : raid5
Raid Devices : 36
Device Size : 71132943 (33.92 GiB 36.42 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 430b9730:4416eb44:2f793e78:a3a92cc1
Update Time : Sun Aug 28 20:43:06 2005
Checksum : 4092a148 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 64K
Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdar:
Magic : a92b4efc
Version : 01.00
Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
Name :
Creation Time : Sun Aug 28 17:46:59 2005
Raid Level : raid5
Raid Devices : 36
Device Size : 71132943 (33.92 GiB 36.42 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 33ea7f64:976740bb:ff88e4bc:84534774
Update Time : Sun Aug 28 20:43:06 2005
Checksum : e2918b3d - correct
Events : 1
Layout : left-symmetric
Chunk Size : 64K
Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
/dev/sdas:
Magic : a92b4efc
Version : 01.00
Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1
Name :
Creation Time : Sun Aug 28 17:46:59 2005
Raid Level : raid5
Raid Devices : 36
Device Size : 71132943 (33.92 GiB 36.42 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : acb2ea9d:7c3f3b6e:98d9f85c:c8cb2bae
Update Time : Sun Aug 28 20:43:06 2005
Checksum : a8eff479 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 64K
Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed
root@devel-0:~ #
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 2:05 ` Mr. James W. Laferriere
@ 2005-09-09 2:15 ` Mr. James W. Laferriere
2005-09-09 7:40 ` Neil Brown
1 sibling, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 2:15 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid maillist
Hello Neil ,
On Thu, 8 Sep 2005, Mr. James W. Laferriere wrote:
> On Fri, 9 Sep 2005, Neil Brown wrote:
>> On Thursday September 8, babydr@baby-dragons.com wrote:
>>> When I try to do the remove I get .
>>> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao
>>> mdadm: hot remove failed for /dev/sdao: Device or resource busy
>>>
>>> I should also have 3 other drives that are spares . I could
>>> try hot remove on one of them . See at bottom the output of
>>> mdadm --misc -Q --detail /dev/md_d0
>>> Which is showing no spare drives ? And I built it with 4
>>> spares
>>
>> Yes... /dev/sda[pqrs] are missing. I wonder why..
>>
>> What does
>> mdadm -E /dev/sda[pqrs]
>> show?
> See way below .
>
>> What happens if you then
>> mdadm /dev/md_d0 -a /dev/sda[pqrs]
>> ??
>
> Getting stranger & stranger .
>
> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
> mdadm: re-added /dev/sdap
Is there any debugging ouptions I can enable from the
boot: prompt or compile in ?
just some more info ... Hth , JimL
# dmesg | tail -n 43
RAID5 conf printout:
--- rd:36 wd:35 fd:1
disk 0, o:1, dev:sdc
disk 1, o:1, dev:sdd
disk 3, o:1, dev:sdf
disk 4, o:1, dev:sdg
disk 5, o:1, dev:sdh
disk 6, o:1, dev:sdi
disk 7, o:1, dev:sdj
disk 8, o:1, dev:sdk
disk 9, o:1, dev:sdl
disk 10, o:1, dev:sdn
disk 11, o:1, dev:sdo
disk 12, o:1, dev:sdp
disk 13, o:1, dev:sdq
disk 14, o:1, dev:sdr
disk 15, o:1, dev:sds
disk 16, o:1, dev:sdt
disk 17, o:1, dev:sdu
disk 18, o:1, dev:sdv
disk 19, o:1, dev:sdw
disk 20, o:1, dev:sdy
disk 21, o:1, dev:sdz
disk 22, o:1, dev:sdaa
disk 23, o:1, dev:sdab
disk 24, o:1, dev:sdac
disk 25, o:1, dev:sdad
disk 26, o:1, dev:sdae
disk 27, o:1, dev:sdaf
disk 28, o:1, dev:sdag
disk 29, o:1, dev:sdah
disk 30, o:1, dev:sdaj
disk 31, o:1, dev:sdak
disk 32, o:1, dev:sdal
disk 33, o:1, dev:sdam
disk 34, o:1, dev:sdan
disk 35, o:1, dev:sdao
md: cannot remove active disk sdao from md_d0 ...
md: cannot remove active disk sdao from md_d0 ...
md: bind<sdap>
md: bind<sdaq>
md: bind<sdar>
md: bind<sdas>
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 2:05 ` Mr. James W. Laferriere
2005-09-09 2:15 ` Mr. James W. Laferriere
@ 2005-09-09 7:40 ` Neil Brown
2005-09-09 11:37 ` David M. Strang
2005-09-09 20:07 ` Mr. James W. Laferriere
1 sibling, 2 replies; 18+ messages in thread
From: Neil Brown @ 2005-09-09 7:40 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 1300 bytes --]
On Thursday September 8, babydr@baby-dragons.com wrote:
> > What happens if you then
> > mdadm /dev/md_d0 -a /dev/sda[pqrs]
> > ??
>
> Getting stranger & stranger .
>
> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
> mdadm: re-added /dev/sdap
>
Hmm.. mdadm bug.
> root@devel-0:~ # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
> md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33]
> sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26]
> sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18]
> sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9]
> sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
kernel.
Attached are three patches.
The first two are needed by 2.6.12.5 to make sure resync happens (this
is particularly a problem for version-1 superblocks) or just upgrade
to 2.6.13.
The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it
actually adds all of them, and so that when you --assemble a version-1
array with spares, the spares actually get included.
NeilBrown
[-- Attachment #2: 349MdHotAddFix --]
[-- Type: text/plain, Size: 786 bytes --]
Status: ok
Make sure recovery happens when add_new_disk is used for hot_add
Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 2 ++
1 files changed, 2 insertions(+)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c 2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
err = bind_rdev_to_array(rdev, mddev);
if (err)
export_rdev(rdev);
+
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
if (mddev->thread)
md_wakeup_thread(mddev->thread);
return err;
[-- Attachment #3: 418MdWakeThread --]
[-- Type: text/plain, Size: 1420 bytes --]
Status: ok
Make sure resync gets started when array starts.
We weren't actually waking up the md thread after setting
MD_RECOVERY_NEEDED when assembling an array, so it is possible to
lose a race and not actually start resync.
So add a call to md_wakeup_thread, and while we are at it, remove
all the "if (mddev->thread)" guards as md_wake_thread does its own
checking.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c 2005-08-26 17:00:30.000000000 +1000
+++ ./drivers/md/md.c~current~ 2005-08-26 17:00:39.000000000 +1000
@@ -256,8 +256,7 @@ static inline void mddev_unlock(mddev_t
{
up(&mddev->reconfig_sem);
- if (mddev->thread)
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(mddev->thread);
}
mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
@@ -1726,6 +1725,7 @@ static int do_md_run(mddev_t * mddev)
mddev->in_sync = 1;
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
if (mddev->sb_dirty)
md_update_sb(mddev);
@@ -2255,8 +2255,7 @@ static int add_new_disk(mddev_t * mddev,
export_rdev(rdev);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- if (mddev->thread)
- md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(mddev->thread);
return err;
}
[-- Attachment #4: patch --]
[-- Type: text/plain, Size: 1128 bytes --]
diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000
+++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000
@@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
struct mdinfo info;
struct mddev_ident_s ident2;
char *avail;
+ int nextspare = 0;
vers = md_get_version(mdfd);
if (vers <= 0) {
@@ -320,6 +321,11 @@ int Assemble(struct supertype *st, char
i = devcnt;
else
i = devices[devcnt].raid_disk;
+ if (i+1 == 0) {
+ if (nextspare < info.array.raid_disks)
+ nextspare = info.array.raid_disks;
+ i = nextspare++;
+ }
if (i < 10000) {
if (i >= bestcnt) {
unsigned int newbestcnt = i+10;
diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~ 2005-09-05 10:54:55.000000000 +1000
+++ ./Manage.c 2005-09-09 16:04:12.000000000 +1000
@@ -288,7 +288,7 @@ int Manage_subdevs(char *devname, int fd
if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) {
if (verbose >= 0)
fprintf(stderr, Name ": re-added %s\n", dv->devname);
- return 0;
+ continue;
}
/* fall back on normal-add */
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 7:40 ` Neil Brown
@ 2005-09-09 11:37 ` David M. Strang
2005-09-09 13:52 ` Mr. James W. Laferriere
2005-09-09 20:07 ` Mr. James W. Laferriere
1 sibling, 1 reply; 18+ messages in thread
From: David M. Strang @ 2005-09-09 11:37 UTC (permalink / raw)
To: Neil Brown, Mr. James W. Laferriere; +Cc: linux-raid maillist
NeilBrown wrote:
> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
> kernel.
I can attest to this; as I workaround I've been using:
mdadm --readonly /dev/mdX
mdadm --readwrite /dev/mdX
That will trigger a rebuild.
-- David
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 11:37 ` David M. Strang
@ 2005-09-09 13:52 ` Mr. James W. Laferriere
2005-09-09 13:59 ` David M. Strang
0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 13:52 UTC (permalink / raw)
To: David M. Strang; +Cc: Neil Brown, linux-raid maillist
Hello David , Thank you for the idea . But ...
root@devel-0:~ # mdadm --readonly /dev/md_d0
mdadm: failed to set readonly for /dev/md_d0: Device or resource busy
I think I'll try Neil's upgrade to 2.6.13 & his patch to
mdadm . I'll report back if that cures my problem .
Tnx to all , JimL
On Fri, 9 Sep 2005, David M. Strang wrote:
> NeilBrown wrote:
>> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
>> kernel.
>
> I can attest to this; as I workaround I've been using:
>
> mdadm --readonly /dev/mdX
> mdadm --readwrite /dev/mdX
>
> That will trigger a rebuild.
>
>
> -- David
>
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 13:52 ` Mr. James W. Laferriere
@ 2005-09-09 13:59 ` David M. Strang
2005-09-09 19:59 ` Mr. James W. Laferriere
0 siblings, 1 reply; 18+ messages in thread
From: David M. Strang @ 2005-09-09 13:59 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist
Mr. James W. Laferriere wrote:
> Hello David , Thank you for the idea . But ...
>
> root@devel-0:~ # mdadm --readonly /dev/md_d0
> mdadm: failed to set readonly for /dev/md_d0: Device or resource busy
James --
umount /dev/md_d0 first; you can remount it right after you re-enable
writes.
That should do the trick =)
-- David
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 13:59 ` David M. Strang
@ 2005-09-09 19:59 ` Mr. James W. Laferriere
0 siblings, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 19:59 UTC (permalink / raw)
To: David M. Strang; +Cc: Neil Brown, linux-raid maillist
Hello David , That did work . Thank you again . JimL
umount /directory
mdadm --readonly /dev/mdX
mdadm --readwrite /dev/mdX
mount /directory
On Fri, 9 Sep 2005, David M. Strang wrote:
> Mr. James W. Laferriere wrote:
>> Hello David , Thank you for the idea . But ...
>>
>> root@devel-0:~ # mdadm --readonly /dev/md_d0
>> mdadm: failed to set readonly for /dev/md_d0: Device or resource busy
>
> James --
>
> umount /dev/md_d0 first; you can remount it right after you re-enable writes.
>
> That should do the trick =)
>
> -- David
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 7:40 ` Neil Brown
2005-09-09 11:37 ` David M. Strang
@ 2005-09-09 20:07 ` Mr. James W. Laferriere
2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown
1 sibling, 2 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 20:07 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid maillist
Hello Neil , I patched all were successful . But after a
make clean ; make
I get ... Tia , JimL
..snip...
gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
Assemble.c: In function `Assemble':
Assemble.c:323: error: `nextspare' undeclared (first use in this function)
Assemble.c:323: error: (Each undeclared identifier is reported only once
Assemble.c:323: error: for each function it appears in.)
make: *** [Assemble.o] Error 1
On Fri, 9 Sep 2005, Neil Brown wrote:
> On Thursday September 8, babydr@baby-dragons.com wrote:
>>> What happens if you then
>>> mdadm /dev/md_d0 -a /dev/sda[pqrs]
>>> ??
>> Getting stranger & stranger .
>>
>> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs]
>> mdadm: re-added /dev/sdap
> Hmm.. mdadm bug.
>
>> root@devel-0:~ # cat /proc/mdstat
>> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
>> md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33]
>> sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26]
>> sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18]
>> sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9]
>> sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1]
>> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35]
>> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
>
> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that
> kernel.
> Attached are three patches.
> The first two are needed by 2.6.12.5 to make sure resync happens (this
> is particularly a problem for version-1 superblocks) or just upgrade
> to 2.6.13.
> The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it
> actually adds all of them, and so that when you --assemble a version-1
> array with spares, the spares actually get included.
> NeilBrown
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* OT: lilo overwriting partition info ?
2005-09-09 20:07 ` Mr. James W. Laferriere
@ 2005-09-09 20:58 ` Mr. James W. Laferriere
2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown
1 sibling, 0 replies; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-09 20:58 UTC (permalink / raw)
To: linux-raid maillist
Hello All , Off topic I know ...
I have a question not related to MD . Have you heard of
complaints about lilo overwriting partition info on disks
after the first 2 if those are in an raid1 ? Or any mentions
of lilo writing to all 16 disks causing problems ?
Tia , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 20:07 ` Mr. James W. Laferriere
2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
@ 2005-09-09 21:49 ` Neil Brown
2005-09-10 0:54 ` Mr. James W. Laferriere
1 sibling, 1 reply; 18+ messages in thread
From: Neil Brown @ 2005-09-09 21:49 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
On Friday September 9, babydr@baby-dragons.com wrote:
>
> Hello Neil , I patched all were successful . But after a
> make clean ; make
> I get ... Tia , JimL
> ..snip...
> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
> Assemble.c: In function `Assemble':
> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
> Assemble.c:323: error: (Each undeclared identifier is reported only once
> Assemble.c:323: error: for each function it appears in.)
> make: *** [Assemble.o] Error 1
That's odd, as the patch contained:
--- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000
+++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000
@@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
struct mdinfo info;
struct mddev_ident_s ident2;
char *avail;
+ int nextspare = 0;
vers = md_get_version(mdfd);
if (vers <= 0) {
Maybe add that bit in by hand??
NeilBrown
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown
@ 2005-09-10 0:54 ` Mr. James W. Laferriere
2005-09-10 21:58 ` Neil Brown
0 siblings, 1 reply; 18+ messages in thread
From: Mr. James W. Laferriere @ 2005-09-10 0:54 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid maillist
Hello Neil ,
On Sat, 10 Sep 2005, Neil Brown wrote:
> On Friday September 9, babydr@baby-dragons.com wrote:
>>
>> Hello Neil , I patched all were successful . But after a
>> make clean ; make
>> I get ... Tia , JimL
>> ..snip...
>> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
>> Assemble.c: In function `Assemble':
>> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
>> Assemble.c:323: error: (Each undeclared identifier is reported only once
>> Assemble.c:323: error: for each function it appears in.)
>> make: *** [Assemble.o] Error 1
>
> That's odd, as the patch contained:
>
> --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000
> +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000
> @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
> struct mdinfo info;
> struct mddev_ident_s ident2;
> char *avail;
> + int nextspare = 0;
>
> vers = md_get_version(mdfd);
> if (vers <= 0) {
What was missing from my 2.0 sources was the 'char *avail;'
and patching failed on that hunk , Which totally missed .
So I hand entered as you suggested the above bits .
Now it failes on a Warning (???) .
Never heard of failures on warnings before .
gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
Assemble.c: In function `Assemble':
Assemble.c:121: warning: unused variable `avail'
make: *** [Assemble.o] Error 1
Would you please cut a source set to the kernel site >
Say as version 2.0a so I can see the diffs against the
sources I have ? Tia , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3542 Broken Yoke Dr. | Give me Linux |
| babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild .
2005-09-10 0:54 ` Mr. James W. Laferriere
@ 2005-09-10 21:58 ` Neil Brown
0 siblings, 0 replies; 18+ messages in thread
From: Neil Brown @ 2005-09-10 21:58 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist
[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 2287 bytes --]
On Friday September 9, babydr@baby-dragons.com wrote:
> Hello Neil ,
>
> On Sat, 10 Sep 2005, Neil Brown wrote:
> > On Friday September 9, babydr@baby-dragons.com wrote:
> >>
> >> Hello Neil , I patched all were successful . But after a
> >> make clean ; make
> >> I get ... Tia , JimL
> >> ..snip...
> >> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
> >> Assemble.c: In function `Assemble':
> >> Assemble.c:323: error: `nextspare' undeclared (first use in this function)
> >> Assemble.c:323: error: (Each undeclared identifier is reported only once
> >> Assemble.c:323: error: for each function it appears in.)
> >> make: *** [Assemble.o] Error 1
> >
> > That's odd, as the patch contained:
> >
> > --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000
> > +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000
> > @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char
> > struct mdinfo info;
> > struct mddev_ident_s ident2;
> > char *avail;
> > + int nextspare = 0;
> >
> > vers = md_get_version(mdfd);
> > if (vers <= 0) {
>
> What was missing from my 2.0 sources was the 'char *avail;'
> and patching failed on that hunk , Which totally missed .
The 'avail' is for a different independent patch which fixes a raid10
issue.
You can ignore it.
> So I hand entered as you suggested the above bits .
>
> Now it failes on a Warning (???) .
I guess you didn't ignore it.
Just add the 'int next_spare = 0;' to what you had. Don't worry that
the 'char *avail;' isn't there.
> Never heard of failures on warnings before .
That would be because of the '-Werror' I put in there to make sure I
don't get lazy about warnings.
>
> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c
> Assemble.c: In function `Assemble':
> Assemble.c:121: warning: unused variable `avail'
> make: *** [Assemble.o] Error 1
>
> Would you please cut a source set to the kernel site >
> Say as version 2.0a so I can see the diffs against the
> sources I have ? Tia , JimL
I hope to do a 2.1 next week. Here is the current complete patch
against 2.0.
NeilBrown
[-- Attachment #2: mdadm.diff --]
[-- Type: application/octet-stream, Size: 6338 bytes --]
diff -ru /var/tmp/mdadm-old/mdadm-2.0/Assemble.c /var/tmp/mdadm-new/mdadm-2.0/Assemble.c
--- /var/tmp/mdadm-old/mdadm-2.0/Assemble.c 2005-08-15 16:31:57.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/Assemble.c 2005-09-09 16:24:50.000000000 +1000
@@ -118,6 +118,8 @@
mddev_dev_t tmpdev;
struct mdinfo info;
struct mddev_ident_s ident2;
+ char *avail;
+ int nextspare = 0;
vers = md_get_version(mdfd);
if (vers <= 0) {
@@ -319,6 +321,11 @@
i = devcnt;
else
i = devices[devcnt].raid_disk;
+ if (i+1 == 0) {
+ if (nextspare < info.array.raid_disks)
+ nextspare = info.array.raid_disks;
+ i = nextspare++;
+ }
if (i < 10000) {
if (i >= bestcnt) {
unsigned int newbestcnt = i+10;
@@ -359,6 +366,8 @@
/* now we have some devices that might be suitable.
* I wonder how many
*/
+ avail = malloc(info.array.raid_disks);
+ memset(avail, 0, info.array.raid_disks);
okcnt = 0;
sparecnt=0;
for (i=0; i< bestcnt ;i++) {
@@ -377,13 +386,16 @@
if (devices[j].events+event_margin >=
devices[most_recent].events) {
devices[j].uptodate = 1;
- if (i < info.array.raid_disks)
+ if (i < info.array.raid_disks) {
okcnt++;
- else
+ avail[i]=1;
+ } else
sparecnt++;
}
}
- while (force && !enough(info.array.level, info.array.raid_disks, okcnt)) {
+ while (force && !enough(info.array.level, info.array.raid_disks,
+ info.array.layout,
+ avail, okcnt)) {
/* Choose the newest best drive which is
* not up-to-date, update the superblock
* and add it.
@@ -434,6 +446,7 @@
close(fd);
devices[chosen_drive].events = devices[most_recent].events;
devices[chosen_drive].uptodate = 1;
+ avail[chosen_drive] = 1;
okcnt++;
free(super);
}
@@ -599,7 +612,7 @@
if (runstop == 1 ||
(runstop == 0 &&
- ( enough(info.array.level, info.array.raid_disks, okcnt) &&
+ ( enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt) &&
(okcnt >= req_cnt || start_partial_ok)
))) {
if (ioctl(mdfd, RUN_ARRAY, NULL)==0) {
@@ -627,7 +640,7 @@
fprintf(stderr, Name ": %s assembled from %d drive%s", mddev, okcnt, okcnt==1?"":"s");
if (sparecnt)
fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s");
- if (!enough(info.array.level, info.array.raid_disks, okcnt))
+ if (!enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt))
fprintf(stderr, " - not enough to start the array.\n");
else {
if (req_cnt == info.array.raid_disks)
diff -ru /var/tmp/mdadm-old/mdadm-2.0/Manage.c /var/tmp/mdadm-new/mdadm-2.0/Manage.c
--- /var/tmp/mdadm-old/mdadm-2.0/Manage.c 2005-08-26 14:49:25.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/Manage.c 2005-09-09 16:04:12.000000000 +1000
@@ -288,7 +288,7 @@
if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) {
if (verbose >= 0)
fprintf(stderr, Name ": re-added %s\n", dv->devname);
- return 0;
+ continue;
}
/* fall back on normal-add */
}
diff -ru /var/tmp/mdadm-old/mdadm-2.0/mdadm.h /var/tmp/mdadm-new/mdadm-2.0/mdadm.h
--- /var/tmp/mdadm-old/mdadm-2.0/mdadm.h 2005-08-26 14:49:24.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/mdadm.h 2005-09-05 10:55:01.000000000 +1000
@@ -291,7 +291,8 @@
extern int same_uuid(int a[4], int b[4], int swapuuid);
/* extern int compare_super(mdp_super_t *first, mdp_super_t *second);*/
extern unsigned long calc_csum(void *super, int bytes);
-extern int enough(int level, int raid_disks, int avail_disks);
+extern int enough(int level, int raid_disks, int layout,
+ char *avail, int avail_disks);
extern int ask(char *mesg);
diff -ru /var/tmp/mdadm-old/mdadm-2.0/super0.c /var/tmp/mdadm-new/mdadm-2.0/super0.c
--- /var/tmp/mdadm-old/mdadm-2.0/super0.c 2005-08-26 14:49:24.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/super0.c 2005-09-05 10:55:01.000000000 +1000
@@ -131,6 +131,10 @@
c = map_num(r5layout, sb->layout);
printf(" Layout : %s\n", c?c:"-unknown-");
}
+ if (sb->level == 10) {
+ printf(" Layout : near=%d, far=%d\n",
+ sb->layout&255, (sb->layout>>8)&255);
+ }
switch(sb->level) {
case 0:
case 4:
@@ -234,6 +238,7 @@
info->array.patch_version = sb->patch_version;
info->array.raid_disks = sb->raid_disks;
info->array.level = sb->level;
+ info->array.layout = sb->layout;
info->array.md_minor = sb->md_minor;
info->array.ctime = sb->ctime;
diff -ru /var/tmp/mdadm-old/mdadm-2.0/super1.c /var/tmp/mdadm-new/mdadm-2.0/super1.c
--- /var/tmp/mdadm-old/mdadm-2.0/super1.c 2005-08-26 16:07:33.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/super1.c 2005-09-05 10:55:01.000000000 +1000
@@ -180,6 +180,11 @@
c = map_num(r5layout, __le32_to_cpu(sb->layout));
printf(" Layout : %s\n", c?c:"-unknown-");
}
+ if (__le32_to_cpu(sb->level) == 10) {
+ int lo = __le32_to_cpu(sb->layout);
+ printf(" Layout : near=%d, far=%d\n",
+ lo&255, (lo>>8)&255);
+ }
switch(__le32_to_cpu(sb->level)) {
case 0:
case 4:
@@ -290,6 +295,7 @@
info->array.patch_version = 0;
info->array.raid_disks = __le32_to_cpu(sb->raid_disks);
info->array.level = __le32_to_cpu(sb->level);
+ info->array.layout = __le32_to_cpu(sb->layout);
info->array.md_minor = -1;
info->array.ctime = __le64_to_cpu(sb->ctime);
diff -ru /var/tmp/mdadm-old/mdadm-2.0/util.c /var/tmp/mdadm-new/mdadm-2.0/util.c
--- /var/tmp/mdadm-old/mdadm-2.0/util.c 2005-08-17 14:28:38.000000000 +1000
+++ /var/tmp/mdadm-new/mdadm-2.0/util.c 2005-09-05 10:55:01.000000000 +1000
@@ -118,10 +118,31 @@
return (a*1000000)+(b*1000)+c;
}
-int enough(int level, int raid_disks, int avail_disks)
+int enough(int level, int raid_disks, int layout,
+ char *avail, int avail_disks)
{
+ int copies, first;
switch (level) {
- case 10: return 1; /* a lie, but it is hard to tell */
+ case 10:
+ /* This is the tricky one - we need to check
+ * which actual disks are present.
+ */
+ copies = (layout&255)* (layout>>8);
+ first=0;
+ do {
+ /* there must be one of the 'copies' form 'first' */
+ int n = copies;
+ int cnt=0;
+ while (n--) {
+ if (avail[first])
+ cnt++;
+ first = (first+1) % raid_disks;
+ }
+ if (cnt == 0)
+ return 0;
+
+ } while (first != 0);
+ return 1;
case -4:
return avail_disks>= 1;
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-09-10 21:58 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker
2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere
2005-09-08 19:34 ` Molle Bestefich
2005-09-08 21:09 ` Neil Brown
2005-09-08 21:39 ` Mr. James W. Laferriere
2005-09-09 0:50 ` Neil Brown
2005-09-09 2:05 ` Mr. James W. Laferriere
2005-09-09 2:15 ` Mr. James W. Laferriere
2005-09-09 7:40 ` Neil Brown
2005-09-09 11:37 ` David M. Strang
2005-09-09 13:52 ` Mr. James W. Laferriere
2005-09-09 13:59 ` David M. Strang
2005-09-09 19:59 ` Mr. James W. Laferriere
2005-09-09 20:07 ` Mr. James W. Laferriere
2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere
2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown
2005-09-10 0:54 ` Mr. James W. Laferriere
2005-09-10 21:58 ` Neil Brown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.