linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Expert opinion on "Recovering from a multiple disk failure"
@ 2003-09-26 21:08 Christof Soehngen
  0 siblings, 0 replies; only message in thread
From: Christof Soehngen @ 2003-09-26 21:08 UTC (permalink / raw)
  To: linux-raid

Hello list,

I seem to be in severe trouble because my software RAID 5 is not
accessible anymore, needless to say the data on it is important to me ;)
I use the four disks hde, hdf, hdg, hdh. I'm 100% sure my /etc/raidtab
has correct and actual settings:

raiddev /dev/md0
 raid-level 5
 nr-raid-disks 4
 nr-spare-disks 0
 persistent-superblock 1
 parity-algorithm left-symmetric
 chunk-size 128
 device /dev/hde1
 raid-disk 0
 device /dev/hdf1
 raid-disk 1
 device /dev/hdg1
 raid-disk 2
 device /dev/hdh1
 raid-disk 3

Yesterday there were two power outages. After the first, I saw one of
the hdds rebuilding (hdd led was on all the time).
After the second outage, the md0 was not recognised correctly anymore
after startup.

I think, the important lines from /var/log/boot.msg are the following:

hdh1's event counter: 0000001c
hdg1's event counter: 0000001c
hdf1's event counter: 0000001a
hde1's event counter: 0000001b
superblock update inconsistency
kicking non-fresh hdf1 from array!
kicking faulty hde1!
not enough operational devices for md0 (2/4 failed)

Now I read the ideas in
http://www.faqs.org/docs/Linux-HOWTO/Software-RAID-HOWTO.html#ss6.1
("Recovering from a multiple disk failure") and played with a test raid
system (md1) a little bit. I found out the following:

- If I create a new raid for testing (md1 on hdd1 to hdd4), stop it,
damage one disk (I formatted it), then do a "mkraid /dev/md1 --force",
all data is lost.
- If I mark the faulty disk as "failed-disk" in /etc/raidtab, then do a
"mkraid /dev/md1 --force", the raid is present again, albeit in degraded
mode. A "raidhotadd /dev/md1 /dev/hdd1" would launch a rebuild.

Now my question is: According to the messages displayed above, I figure
out that disk hde1 is damaged, disk hdf1 has a wrong superblock.
I would do the following :

1. Mark hde1 as failed-disk in /etc/raidtab
2. Do a "mkraid /dev/md0 --force"
3. Do a "raidhotadd /dev/md0 /dev/hde1"

What do you think, will my real data be online again?
I really would appreciate your help, thanks in advance, Christof


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2003-09-26 21:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-26 21:08 Expert opinion on "Recovering from a multiple disk failure" Christof Soehngen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).