* strange RAID5 problem
@ 2006-05-09 5:30 Maurice Hilarius
2006-05-09 5:45 ` Neil Brown
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Maurice Hilarius @ 2006-05-09 5:30 UTC (permalink / raw)
To: linux-raid; +Cc: neilb
Good evening.
I am having a bit of a problem with a largish RAID5 set.
Now it is looking more and more like I am about to lose all the data on
it, so I am asking (begging?) to see if anyone can help me sort this out.
Here is the scenario: 16 SATA disks connected to a pair of AMCC(3Ware)
9550SX-12 controllers.
RAID 5, 15 disks, plus 1 hot spare.
SMART started reporting errors on a disk, so it was retired with the
3Ware CLI, then removed and replaced.
The new disk had a JBOD signature added with the 3Ware CLI, then a
single large partition was created with fdisk.
At this point I would expect to be able to add the disk back to the
array by:
[root@box ~]# mdadm /dev/md3 -a /dev/sdw1
But, I get this error message:
mdadm: hot add failed for /dev/sdw1: No such device
What? We just made the partition on sdw a moment ago in fdisk. It IS there!
So. we look around a bit:
# /cat/proc/mdstat
md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
5860631040 blocks
Yup, that looks correct, missing sdw1[6]
Looking more:
# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.01
Creation Time : Tue Jan 10 19:21:23 2006
Raid Level : raid5
Device Size : 390708736 (372.61 GiB 400.09 GB)
Raid Devices : 16
Total Devices : 15
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Mon May 8 19:33:36 2006
State : active, degraded
Active Devices : 15
Working Devices : 15
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 771aa4c0:48d9b467:44c847e2:9bc81c43
Events : 0.1818687
Number Major Minor RaidDevice State
0 65 1 0 active sync /dev/sdq1
1 65 17 1 active sync /dev/sdr1
2 65 33 2 active sync /dev/sds1
3 65 49 3 active sync /dev/sdt1
4 65 65 4 active sync /dev/sdu1
5 65 81 5 active sync /dev/sdv1
609 0 0 0 removed
7 65 113 7 active sync /dev/sdx1
8 65 129 8 active sync /dev/sdy1
9 65 145 9 active sync /dev/sdz1
10 65 161 10 active sync /dev/sdaa1
11 65 177 11 active sync /dev/sdab1
12 65 193 12 active sync /dev/sdac1
13 65 209 13 active sync /dev/sdad1
14 65 225 14 active sync /dev/sdae1
15 65 241 15 active sync /dev/sdaf1
That also looks to be as expected.
So, lets try to assemble it again and force sdw1 in to it:
[root@box ~]# mdadm
--assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
/dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1
/dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1
mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted
[root@box ~]# mdadm
--assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
/dev/sdv1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1
/dev/sdad1 /dev/sdae1 /dev/sdaf1
mdadm: failed to RUN_ARRAY /dev/md3: Invalid argument
[root@box ~]# mdadm
-A /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1
/dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1
/dev/sdad1 /dev/sdae1 /dev/sdaf1
mdadm: device /dev/md3 already active - cannot assemble it
[root@box ~]# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid1 hdb3[1] hda3[0]
115105600 blocks [2/2] [UU]
md2 : active raid5 sdp1[15] sdo1[14] sdn1[13] sdm1[12] sdl1[11] sdk1[10]
sdj1[9] sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
sda1[0]
5860631040 blocks level 5, 256k chunk, algorithm 2 [16/16]
[UUUUUUUUUUUUUUUU]
md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
5860631040 blocks
md0 : active raid1 hdb1[1] hda1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
[root@box ~]# mdadm /dev/md3 -a /dev/sdw1
mdadm: hot add failed for /dev/sdw1: No such device
OK, let's mount the degraded RAID and try to copy the files to somewhere
else, so we can make it from scratch:
[root@box ~]# mount /dev/md3 /all/boxw16/
/dev/md3: Invalid argument
mount: /dev/md3: can't read superblock
[root@box ~]# fsck /dev/md3
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
fsck.ext2: Invalid argument while trying to open /dev/md3
The superblock could not be read..
[root@box ~]# mke2fs -n /dev/md3
mke2fs 1.35 (28-Feb-2004)
mke2fs: Device size reported to be zero. Invalid partition specified,
or partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to
reboot to re-read your partition table.
So, now what to do?
Any ideas would be DEEPLY appreciated !
--
Regards,
Maurice
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 5:30 strange RAID5 problem Maurice Hilarius
@ 2006-05-09 5:45 ` Neil Brown
2006-05-09 5:58 ` Luca Berra
2006-05-09 6:12 ` strange RAID5 problem CaT
2 siblings, 0 replies; 8+ messages in thread
From: Neil Brown @ 2006-05-09 5:45 UTC (permalink / raw)
To: Maurice Hilarius; +Cc: linux-raid
On Monday May 8, maurice@harddata.com wrote:
> Good evening.
>
> I am having a bit of a problem with a largish RAID5 set.
> Now it is looking more and more like I am about to lose all the data on
> it, so I am asking (begging?) to see if anyone can help me sort this out.
>
Very thorough description, but you omitted the 'dmesg' output
corresponding to :
>
> [root@box ~]# mdadm
> --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
> /dev/sdv1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1
> /dev/sdad1 /dev/sdae1 /dev/sdaf1
> mdadm: failed to RUN_ARRAY /dev/md3: Invalid argument
Also, you don't seem to have tried '--force' with '--assemble'. It
might help.
NeilBrown
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 5:30 strange RAID5 problem Maurice Hilarius
2006-05-09 5:45 ` Neil Brown
@ 2006-05-09 5:58 ` Luca Berra
2006-05-09 16:16 ` Maurice Hilarius
2006-05-09 6:12 ` strange RAID5 problem CaT
2 siblings, 1 reply; 8+ messages in thread
From: Luca Berra @ 2006-05-09 5:58 UTC (permalink / raw)
To: linux-raid
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
>[root@box ~]# mdadm /dev/md3 -a /dev/sdw1
>
>But, I get this error message:
>mdadm: hot add failed for /dev/sdw1: No such device
>
>What? We just made the partition on sdw a moment ago in fdisk. It IS there!
I don't believe you, prove it (/proc/partitions)
>So. we look around a bit:
># /cat/proc/mdstat
>
>md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
>sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
>sdr1[1]
> 5860631040 blocks
>
>Yup, that looks correct, missing sdw1[6]
no, it does not, it is 'inactive'
>[root@box ~]# cat /proc/mdstat
>Personalities : [raid1] [raid5]
...
>md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
>sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
>sdr1[1]
> 5860631040 blocks
...
>[root@box ~]# mdadm /dev/md3 -a /dev/sdw1
>mdadm: hot add failed for /dev/sdw1: No such device
>
>OK, let's mount the degraded RAID and try to copy the files to somewhere
>else, so we can make it from scratch:
>
>[root@box ~]# mount /dev/md3 /all/boxw16/
>/dev/md3: Invalid argument
>mount: /dev/md3: can't read superblock
>
it is still inactive, no wonder you cannot access it.
try running the array, or really stop it before assembling.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 5:30 strange RAID5 problem Maurice Hilarius
2006-05-09 5:45 ` Neil Brown
2006-05-09 5:58 ` Luca Berra
@ 2006-05-09 6:12 ` CaT
2 siblings, 0 replies; 8+ messages in thread
From: CaT @ 2006-05-09 6:12 UTC (permalink / raw)
To: Maurice Hilarius; +Cc: linux-raid, neilb
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
> [root@box ~]# mdadm
> --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
> /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1
> /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1
> mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted
Have you tried zeroing the superblock with
mdadm --misc --zero-superblock /dev/sdw1
and then adding it in?
> [root@box ~]# mount /dev/md3 /all/boxw16/
> /dev/md3: Invalid argument
> mount: /dev/md3: can't read superblock
Wow that looks messy. ummm. about the only thing I can think of is
failing /dev/sdw1 and removing it (I know it says it's not there
but...)
Also, not biggest expert on raid around here. ;)
--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 5:58 ` Luca Berra
@ 2006-05-09 16:16 ` Maurice Hilarius
2006-05-09 19:20 ` Luca Berra
0 siblings, 1 reply; 8+ messages in thread
From: Maurice Hilarius @ 2006-05-09 16:16 UTC (permalink / raw)
To: Luca Berra; +Cc: linux-raid
Luca Berra wrote:
> On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
>> [root@box ~]# mdadm /dev/md3 -a /dev/sdw1
>>
>> But, I get this error message:
>> mdadm: hot add failed for /dev/sdw1: No such device
>>
>> What? We just made the partition on sdw a moment ago in fdisk. It IS
>> there!
>
> I don't believe you, prove it (/proc/partitions)
>
>
I understand. Here we go then. Devices in question bracketed with "**":
[root@box ~]# cat /proc/partitions
major minor #blocks name
3 0 117220824 hda
3 1 104391 hda1
3 2 2008125 hda2
3 3 115105725 hda3
3 64 117220824 hdb
3 65 104391 hdb1
3 66 2008125 hdb2
3 67 115105725 hdb3
8 0 390711384 sda
8 1 390708801 sda1
8 16 390711384 sdb
8 17 390708801 sdb1
8 32 390711384 sdc
8 33 390708801 sdc1
8 48 390711384 sdd
8 49 390708801 sdd1
8 64 390711384 sde
8 65 390708801 sde1
8 80 390711384 sdf
8 81 390708801 sdf1
8 96 390711384 sdg
8 97 390708801 sdg1
8 112 390711384 sdh
8 113 390708801 sdh1
8 128 390711384 sdi
8 129 390708801 sdi1
8 144 390711384 sdj
8 145 390708801 sdj1
8 160 390711384 sdk
8 161 390708801 sdk1
8 176 390711384 sdl
8 177 390708801 sdl1
8 192 390711384 sdm
8 193 390708801 sdm1
8 208 390711384 sdn
8 209 390708801 sdn1
8 224 390711384 sdo
8 225 390708801 sdo1
8 240 390711384 sdp
8 241 390708801 sdp1
65 0 390711384 sdq
65 1 390708801 sdq1
65 16 390711384 sdr
65 17 390708801 sdr1
65 32 390711384 sds
65 33 390708801 sds1
65 48 390711384 sdt
65 49 390708801 sdt1
65 64 390711384 sdu
65 65 390708801 sdu1
65 80 390711384 sdv
65 81 390708801 sdv1
**
65 96 390711384 sdw
65 97 390708801 sdw1
**
65 112 390711384 sdx
65 113 390708801 sdx1
65 128 390711384 sdy
65 129 390708801 sdy1
65 144 390711384 sdz
65 145 390708801 sdz1
65 160 390711384 sdaa
65 161 390708801 sdaa1
65 176 390711384 sdab
65 177 390708801 sdab1
65 192 390711384 sdac
65 193 390708801 sdac1
65 208 390711384 sdad
65 209 390708801 sdad1
65 224 390711384 sdae
65 225 390708801 sdae1
65 240 390711384 sdaf
65 241 390708801 sdaf1
**
9 0 104320 md0
**
9 2 5860631040 md2
9 1 115105600 md1
--
Regards,
Maurice
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 16:16 ` Maurice Hilarius
@ 2006-05-09 19:20 ` Luca Berra
2006-05-09 22:19 ` Maurice Hilarius
0 siblings, 1 reply; 8+ messages in thread
From: Luca Berra @ 2006-05-09 19:20 UTC (permalink / raw)
To: linux-raid
On Tue, May 09, 2006 at 10:16:25AM -0600, Maurice Hilarius wrote:
>Luca Berra wrote:
>> On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
>>> [root@box ~]# mdadm /dev/md3 -a /dev/sdw1
>>>
>>> But, I get this error message:
>>> mdadm: hot add failed for /dev/sdw1: No such device
>>>
>>> What? We just made the partition on sdw a moment ago in fdisk. It IS
>>> there!
>>
>> I don't believe you, prove it (/proc/partitions)
>>
>>
>I understand. Here we go then. Devices in question bracketed with "**":
>
ok, now i do.
is the /dev/sdw1 device file correctly created?
you could try straceing mdadm to see what happens
what about the other suggestion? trying to stop the array and restart
it, since it is marked as inactive.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: strange RAID5 problem
2006-05-09 19:20 ` Luca Berra
@ 2006-05-09 22:19 ` Maurice Hilarius
2006-05-10 14:54 ` Thanks! Was:[Re: strange RAID5 problem] Maurice Hilarius
0 siblings, 1 reply; 8+ messages in thread
From: Maurice Hilarius @ 2006-05-09 22:19 UTC (permalink / raw)
To: Luca Berra; +Cc: linux-raid, Neil Brown
Luca Berra wrote:
> ..
>>> I don't believe you, prove it (/proc/partitions)
>>>
>> I understand. Here we go then. Devices in question bracketed with "**":
>>
> ok, now i do.
> is the /dev/sdw1 device file correctly created?
> you could try straceing mdadm to see what happens
>
> what about the other suggestion? trying to stop the array and restart
> it, since it is marked as inactive.
> L.
>
Here is what we ended up doing that fixed it.
Thanks to Neil on the --force, however even with that,
ALL parameters were needed on the mdadm -C or it still refused.
We used EVMS to rebuild as that is what originally created the RAID.
mdadm -C /dev/md3 --chunk=256 --level=5 --parity=ls --raid-devices=16
--force /dev/evms/.nodes/sdq1 /dev/evms/.nodes/sdr1
/dev/evms/.nodes/sds1 /dev/evms/.nodes/sdt1 /dev/evms/.nodes/sdu1
/dev/evms/.nodes/sdv1 missing /dev/evms/.nodes/sdx1
/dev/evms/.nodes/sdy1 /dev/evms/.nodes/sdz1 /dev/evms/.nodes/sdaa1
/dev/evms/.nodes/sdab1 /dev/evms/.nodes/sdac1 /dev/evms/.nodes/sdad1
/dev/evms/.nodes/sdae1 /dev/evms/.nodes/sdaf1
Notice we are assembling a device with a "missing" member, and the
devices are in "order" per: mdamd -D /dev/md3
This was the *only* that it would come up. It was mountable, data seems
intact.
We started the rebuild with no errors by simply adding the device
as I mentioned before with -a.
Then sped it up via:
echo "100000" > /proc/sys/dev/raid/speed_limit_min
Because frankly we have the resources to do so and need it going as fast
as possible.
--
Regards,
Maurice
^ permalink raw reply [flat|nested] 8+ messages in thread
* Thanks! Was:[Re: strange RAID5 problem]
2006-05-09 22:19 ` Maurice Hilarius
@ 2006-05-10 14:54 ` Maurice Hilarius
0 siblings, 0 replies; 8+ messages in thread
From: Maurice Hilarius @ 2006-05-10 14:54 UTC (permalink / raw)
To: linux-raid
Thanks to Neil, Luca, and CaT, who were all a big help.
--
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue email:maurice@harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-05-10 14:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-09 5:30 strange RAID5 problem Maurice Hilarius
2006-05-09 5:45 ` Neil Brown
2006-05-09 5:58 ` Luca Berra
2006-05-09 16:16 ` Maurice Hilarius
2006-05-09 19:20 ` Luca Berra
2006-05-09 22:19 ` Maurice Hilarius
2006-05-10 14:54 ` Thanks! Was:[Re: strange RAID5 problem] Maurice Hilarius
2006-05-09 6:12 ` strange RAID5 problem CaT
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).