From: Trevor Cordes <trevor@tecnopolis.ca>
To: linux-raid@vger.kernel.org
Subject: RAID 6 corruption : please help!
Date: Wed, 5 Oct 2005 10:26:38 -0500 [thread overview]
Message-ID: <20051005152638.GA24382@pog.tecnopolis.ca> (raw)
Hi all,
I'm in a dire situation. I need some advice. If you are knowledgable,
please help. If you are knowledgable but busy, I can arrange to pay for
your help.
I was rebuilding my file server to switch some disks around and add new
ones. I have a 2TB RAID6 array. I was removing 2 components of the array
and adding 2 new ones. I'm using FC3 2.6 kernel and mdadm for all
operations.
I took out the 2 decommissioned drives and put in the 2 new ones.
I hotadded the 2 new ones like: mdadm -a /dev/md3 /dev/hd[qs]2 (mistake
#1?)
md0-2 were still being rebuilt from other work I was doing (they are root,
boot and swap) so mdstat showed md3 as being "delayed" for rebuild. md3
shares 2 disks with the RAID 1 root/boot/swap.
I mounted md3 rw and was able to access/write to it no problems (mistake
#2?).
I had to reboot to change the NIC (it's a long story), and since md0 was
still being rebuilt and md3 had NOT started rebuilding, I thought it would
be ok (mistake #3?).
Rebooted and md0 started rebuilding again. md3 still said it was waiting
before rebuilding.
On boot up I got some very weird behaviour from md3. Logs showed md3 was
operational with 9 of 10 disks (fd:1) including one of the new ones (q)
that had not been synched yet! It also said hds2 was a spare, and it said
hdq2 was operational?! I tried to mount ro and it failed with the usual
filesystem-corrupted errors you get when you're majorly screwed.
If I look at a hexdump of hdq2 and hds2 I can see that some data was
written to these yet-to-be-rebuilt drives... probably in the places where
I was writing when it was mounted rw?
The important thing in my mind is I know for sure that no rebuild was ever
started on md3, because md0 was always rebuilding. I had mdadm --stop
md3 before md0 ever finished and have not added it back in without
restarting a rebuild on md0 again.
I'm trying to figure out what exactly occurred so I can try to undo it.
I'm very good at data recovery and hex editors and such, having saved many
a RAID5 dual-disk failure scenario, even with corrupted partition tables.
I just can't understand what RAID6 was doing.
I know all the data is just sitting there, there's just some wackiness to
the way it's been spread across the disks.
Please help! Please email, I can provide a phone # if you think that will
help you help me. Thanks!
next reply other threads:[~2005-10-05 15:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-05 15:26 Trevor Cordes [this message]
2005-10-05 17:14 ` RAID 6 corruption : please help! Trevor Cordes
2005-10-06 0:17 ` Tyler
2005-10-07 18:16 ` Trevor Cordes
2005-10-15 4:18 ` Bill Davidsen
2005-10-06 15:33 ` Molle Bestefich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051005152638.GA24382@pog.tecnopolis.ca \
--to=trevor@tecnopolis.ca \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).