* Problem recovering a failed RIAD5 array with 4-drives.
@ 2007-07-12 13:49 James
2007-07-12 16:44 ` Lennart Sorensen
2007-07-12 22:48 ` Neil Brown
0 siblings, 2 replies; 9+ messages in thread
From: James @ 2007-07-12 13:49 UTC (permalink / raw)
To: linux-kernel
My apologies if this is not the correct forum. If there is a better place to
post this please advise.
Linux localhost.localdomain 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006
i686 i686 i386 GNU/Linux
(I was planning to upgrade to FC7 this weekend, but that is currently on hold
because-)
I've got a problem with a software RIAD5 using mdadm.
Drive sdc failed causing sda to appear failed. Both drives where marked
as 'spare'.
What follows is a record of the steps I've taken and the results. I'm looking
for some direction/advice to get the data back.
I've tried a few cautions things to bring the array back up with the three
good drives with no luck.
The last thing attempted had some limited success. I was able to get all
drives powered up. I checked the Event count on the three good drives and
they were all equal. So I assumed it would be safe to do the following. I
hope I was not wrong. I issued the following commands to try to bring the
array into a usable state.
[]#
mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
[]# /sbin/mdadm --misc --test --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed Jul 11 08:03:20 2007
Raid Level : raid5
Array Size : 1465175808 (1397.30 GiB 1500.34 GB)
Device Size : 488391936 (465.77 GiB 500.11 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Jul 11 08:03:47 2007
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : e46beb22:37d329db:dd16ea76:29c07a23
Events : 0.2
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 1 2 active sync /dev/sda1
3 8 49 3 active sync /dev/sdd1
[]# mdadm --fail /dev/md0 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md0
[]# /sbin/mdadm --misc --test --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed Jul 11 08:03:20 2007
Raid Level : raid5
Array Size : 1465175808 (1397.30 GiB 1500.34 GB)
Device Size : 488391936 (465.77 GiB 500.11 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Jul 11 14:37:56 2007
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : e46beb22:37d329db:dd16ea76:29c07a23
Events : 0.3
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
10 0 0 0 removed
2 8 1 2 active sync /dev/sda1
3 8 49 3 active sync /dev/sdd1
4 8 33 - faulty spare /dev/sdc1
[]# mount /dev/md0 /opt
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
In /var/log/messages
Jul 11 14:32:44 localhost kernel: EXT3-fs: md0: couldn't mount because of
unsupported optional features (4000000).
[]# /sbin/fsck /dev/md0
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
fsck.ext3: Filesystem revision too high while trying to open /dev/md0
The filesystem revision is apparently too high for this version of e2fsck.
(Or the filesystem superblock is corrupt)
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
[]# mke2fs -n /dev/md0
mke2fs 1.38 (30-Jun-2005)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
183156736 inodes, 366293952 blocks
18314697 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=369098752
11179 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
I tried the following for all Superblock backups with the same result.
[]# e2fsck -b 214990848 /dev/md0
e2fsck 1.38 (30-Jun-2005)
/sbin/e2fsck: Invalid argument while trying to open /dev/md0
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
Any advice/direction would be appreciated.
Thanks much.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 13:49 Problem recovering a failed RIAD5 array with 4-drives James
@ 2007-07-12 16:44 ` Lennart Sorensen
2007-07-12 20:21 ` James
2007-07-12 22:48 ` Neil Brown
1 sibling, 1 reply; 9+ messages in thread
From: Lennart Sorensen @ 2007-07-12 16:44 UTC (permalink / raw)
To: James; +Cc: linux-kernel
On Thu, Jul 12, 2007 at 08:49:15AM -0500, James wrote:
> My apologies if this is not the correct forum. If there is a better place to
> post this please advise.
>
>
> Linux localhost.localdomain 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT 2006
> i686 i686 i386 GNU/Linux
>
> (I was planning to upgrade to FC7 this weekend, but that is currently on hold
> because-)
>
> I've got a problem with a software RIAD5 using mdadm.
> Drive sdc failed causing sda to appear failed. Both drives where marked
> as 'spare'.
>
> What follows is a record of the steps I've taken and the results. I'm looking
> for some direction/advice to get the data back.
>
>
> I've tried a few cautions things to bring the array back up with the three
> good drives with no luck.
>
> The last thing attempted had some limited success. I was able to get all
> drives powered up. I checked the Event count on the three good drives and
> they were all equal. So I assumed it would be safe to do the following. I
> hope I was not wrong. I issued the following commands to try to bring the
> array into a usable state.
>
>
>
>
> []#
> mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
Don't you want assemble rather than create if it already exists?
How did two drives fail at the same time? Are you running PATA drives
with two drives on a single cable? That is a no no for raid. PATA
drive failures often take out the bus and you never want two drives in a
single raid to share an IDE bus.
You probably want to try and assemble the non failed drives, and then
add in the new replacement drive afterwards, since after all it is NOT
clean. Hopefully the raid will accept back sda even though it appeared
failed. Then you can add the new sdc to resync the raid.
--
Len Sorensen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 16:44 ` Lennart Sorensen
@ 2007-07-12 20:21 ` James
2007-07-12 21:41 ` Phil Turmel
0 siblings, 1 reply; 9+ messages in thread
From: James @ 2007-07-12 20:21 UTC (permalink / raw)
To: linux-kernel
> On Thu, Jul 12, 2007 at 08:49:15AM -0500, James wrote:
> > My apologies if this is not the correct forum. If there is a better place
to
> > post this please advise.
> >
> >
> > Linux localhost.localdomain 2.6.17-1.2187_FC5 #1 Mon Sep 11 01:17:06 EDT
2006
> > i686 i686 i386 GNU/Linux
> >
> > (I was planning to upgrade to FC7 this weekend, but that is currently on
hold
> > because-)
> >
> > I've got a problem with a software RIAD5 using mdadm.
> > Drive sdc failed causing sda to appear failed. Both drives where marked
> > as 'spare'.
> >
> > What follows is a record of the steps I've taken and the results. I'm
looking
> > for some direction/advice to get the data back.
> >
> >
> > I've tried a few cautions things to bring the array back up with the three
> > good drives with no luck.
> >
> > The last thing attempted had some limited success. I was able to get all
> > drives powered up. I checked the Event count on the three good drives and
> > they were all equal. So I assumed it would be safe to do the following. I
> > hope I was not wrong. I issued the following commands to try to bring the
> > array into a usable state.
> >
> >
> >
> >
> > []#
> >
mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
>
> Don't you want assemble rather than create if it already exists?
>
> How did two drives fail at the same time? Are you running PATA drives
> with two drives on a single cable? That is a no no for raid. PATA
> drive failures often take out the bus and you never want two drives in a
> single raid to share an IDE bus.
>
> You probably want to try and assemble the non failed drives, and then
> add in the new replacement drive afterwards, since after all it is NOT
> clean. Hopefully the raid will accept back sda even though it appeared
> failed. Then you can add the new sdc to resync the raid.
>
> --
> Len Sorensen
>
I should have included more information. When I attempted to --assemble the
array I received the following:
[]# mdadm --assemble [--force --run] /dev/md0 /dev/sda1 /dev/sdb1
[/dev/sdc1] /dev/sdd1
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
>From what I read I assumed I could use the --assume-clean option with --create
to bring the array back at least in some semblance of working order.
I'd like to recover as much as possible from the RAID array. I actually have a
nice new SATA configuration sitting here waiting to receive the data. This
thing failed a day too early. I'm gnashing my teeth over this one.
I'd truly appreciate any help/advice.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 20:21 ` James
@ 2007-07-12 21:41 ` Phil Turmel
0 siblings, 0 replies; 9+ messages in thread
From: Phil Turmel @ 2007-07-12 21:41 UTC (permalink / raw)
To: LinuxKernel; +Cc: linux-kernel
James wrote:
[snip /]
>>On Thu, Jul 12, 2007 at 08:49:15AM -0500, James wrote:
>>>I've tried a few cautions things to bring the array back up with the three
>>>good drives with no luck.
>>>
[snip /]
>
> mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
>
[snip /]
>
> I should have included more information. When I attempted to --assemble the
> array I received the following:
>
> []# mdadm --assemble [--force --run] /dev/md0 /dev/sda1 /dev/sdb1
> [/dev/sdc1] /dev/sdd1
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
>
>
> From what I read I assumed I could use the --assume-clean option with --create
> to bring the array back at least in some semblance of working order.
>
> I'd like to recover as much as possible from the RAID array. I actually have a
> nice new SATA configuration sitting here waiting to receive the data. This
> thing failed a day too early. I'm gnashing my teeth over this one.
>
> I'd truly appreciate any help/advice.
>
Hi James,
mdadm allows you to specify "missing" in place of a failed device
when assembling or creating arrays, like so:
mdadm --assemble /dev/md0 --run \
/dev/sda1 /dev/sdb1 missing /dev/sdd1
I don't know if using --create has already trashed your array,
but this is worth a try. You may also want to try --force with
the above.
HTH,
Phil
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 13:49 Problem recovering a failed RIAD5 array with 4-drives James
2007-07-12 16:44 ` Lennart Sorensen
@ 2007-07-12 22:48 ` Neil Brown
2007-07-12 23:10 ` James
1 sibling, 1 reply; 9+ messages in thread
From: Neil Brown @ 2007-07-12 22:48 UTC (permalink / raw)
To: LinuxKernel; +Cc: linux-kernel
On Thursday July 12, LinuxKernel@jamesplace.net wrote:
>
> []#
> mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
>
snip
>
> Number Major Minor RaidDevice State
> 0 8 17 0 active sync /dev/sdb1
> 1 8 33 1 active sync /dev/sdc1
> 2 8 1 2 active sync /dev/sda1
> 3 8 49 3 active sync /dev/sdd1
Something looks very wrong here. You listed the devices to --create
in one order:
a b c d
but that appear in the array in a different order
b c a d
Did you cut/paste the command line into the mail, or did you retype
it? If you retyped it, could you have got it wrong?
You need the order that --detail shows to match the order of the
original array....
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 22:48 ` Neil Brown
@ 2007-07-12 23:10 ` James
2007-07-12 23:21 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: James @ 2007-07-12 23:10 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
On Thu July 12 2007 5:48 pm, you wrote:
> On Thursday July 12, LinuxKernel@jamesplace.net wrote:
> >
> > []#
> >
mdadm --create --verbose /dev/md0 --assume-clean --level=raid5 --raid-devices=4 --spare-devices=0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
> >
> snip
> >
> > Number Major Minor RaidDevice State
> > 0 8 17 0 active sync /dev/sdb1
> > 1 8 33 1 active sync /dev/sdc1
> > 2 8 1 2 active sync /dev/sda1
> > 3 8 49 3 active sync /dev/sdd1
>
> Something looks very wrong here. You listed the devices to --create
> in one order:
> a b c d
> but that appear in the array in a different order
> b c a d
>
> Did you cut/paste the command line into the mail, or did you retype
> it? If you retyped it, could you have got it wrong?
>
> You need the order that --detail shows to match the order of the
> original array....
>
> NeilBrown
>
>
I don't know the original order of the array before all the problems started.
Is there a way to determine the original order?
The order that --detail is showing now is the order that appeared after
issuing the command is it is in the email. (ie: a b c d)
Thank again.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives.
2007-07-12 23:10 ` James
@ 2007-07-12 23:21 ` Neil Brown
2007-07-13 0:49 ` Problem recovering a failed RIAD5 array with 4-drives. --RESOLVED James
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2007-07-12 23:21 UTC (permalink / raw)
To: LinuxKernel; +Cc: linux-kernel
On Thursday July 12, LinuxKernel@jamesplace.net wrote:
>
> I don't know the original order of the array before all the problems started.
>
> Is there a way to determine the original order?
No, unless you have some old kernel logs of the last time it assembled
the array properly.
The one thing that "--create" does destroy is the information about
any previous array that the drives were a part of.
>
> The order that --detail is showing now is the order that appeared after
> issuing the command is it is in the email. (ie: a b c d)
Odd. I cannot reproduce it.
I suggest you try different arrangements (of the 3 good drives and the
word 'missing') until you find one that 'fsck -n' likes.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives. --RESOLVED
2007-07-12 23:21 ` Neil Brown
@ 2007-07-13 0:49 ` James
2007-07-16 15:04 ` David Greaves
0 siblings, 1 reply; 9+ messages in thread
From: James @ 2007-07-13 0:49 UTC (permalink / raw)
To: linux-kernel; +Cc: Neil Brown
> >
> > I don't know the original order of the array before all the problems
started.
> >
> > Is there a way to determine the original order?
>
> No, unless you have some old kernel logs of the last time it assembled
> the array properly.
> The one thing that "--create" does destroy is the information about
> any previous array that the drives were a part of.
>
> >
> > The order that --detail is showing now is the order that appeared after
> > issuing the command is it is in the email. (ie: a b c d)
>
> Odd. I cannot reproduce it.
> I suggest you try different arrangements (of the 3 good drives and the
> word 'missing') until you find one that 'fsck -n' likes.
>
> NeilBrown
>
>
I don't understand how the order of --detail was different than the command
line on my system, however....
YOU ARE A LIFE SAVER!!!
After going through 21 combinations, beginning to lose all hope and plummeting
into eternal despair, combo 22 worked. The array is up and working. All the
data (1.3Tb) is there and I'm probably the happiest character on the mail
list today.
Thanks a bunch for your help.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Problem recovering a failed RIAD5 array with 4-drives. --RESOLVED
2007-07-13 0:49 ` Problem recovering a failed RIAD5 array with 4-drives. --RESOLVED James
@ 2007-07-16 15:04 ` David Greaves
0 siblings, 0 replies; 9+ messages in thread
From: David Greaves @ 2007-07-16 15:04 UTC (permalink / raw)
To: LinuxKernel; +Cc: linux-kernel, Neil Brown
James wrote:
>>> I don't know the original order of the array before all the problems
> started.
>>> Is there a way to determine the original order?
>> No, unless you have some old kernel logs of the last time it assembled
>> the array properly.
>> The one thing that "--create" does destroy is the information about
>> any previous array that the drives were a part of.
>>
>>> The order that --detail is showing now is the order that appeared after
>>> issuing the command is it is in the email. (ie: a b c d)
>> Odd. I cannot reproduce it.
>> I suggest you try different arrangements (of the 3 good drives and the
>> word 'missing') until you find one that 'fsck -n' likes.
>>
>> NeilBrown
>>
>>
>
> I don't understand how the order of --detail was different than the command
> line on my system, however....
>
> YOU ARE A LIFE SAVER!!!
>
> After going through 21 combinations, beginning to lose all hope and plummeting
> into eternal despair, combo 22 worked. The array is up and working. All the
> data (1.3Tb) is there and I'm probably the happiest character on the mail
> list today.
>
> Thanks a bunch for your help.
Funnily enough someone else was having a similar problem on the linux-raid list
at the same time
Here's a script that may be useful to others in this predicament - a hell of a
lot quicker than doing it by hand...
The 'is the filesystem safe' test probably wants improving from a read-only mount...
http://linux-raid.osdl.org/index.php/RAID_Recovery
http://linux-raid.osdl.org/index.php/Permute_array.pl
David
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-07-16 15:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-12 13:49 Problem recovering a failed RIAD5 array with 4-drives James
2007-07-12 16:44 ` Lennart Sorensen
2007-07-12 20:21 ` James
2007-07-12 21:41 ` Phil Turmel
2007-07-12 22:48 ` Neil Brown
2007-07-12 23:10 ` James
2007-07-12 23:21 ` Neil Brown
2007-07-13 0:49 ` Problem recovering a failed RIAD5 array with 4-drives. --RESOLVED James
2007-07-16 15:04 ` David Greaves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox