From: Mike Hardy <mhardy@h3c.com>
To: linux-raid@vger.kernel.org
Subject: the dreaded double disk failure
Date: Wed, 12 Jan 2005 23:14:09 -0800 [thread overview]
Message-ID: <41E61FC1.6060603@h3c.com> (raw)
Alas, I've been bitten.
Worse, it was after attempting to use raidreconf and having it trash the
array with my backup on it instead of extending it. I know raidreconf is
a use-at-your-own-risk tool, but it was the backup, so I didn't mind.
Until I got this (partial mdadm -E output):
Number Major Minor RaidDevice State
this 7 91 1 7 active sync /dev/hds1
0 0 33 1 0 active sync /dev/hde1
1 1 34 1 1 active sync /dev/hdg1
2 2 56 1 2 active sync /dev/hdi1
3 3 57 1 3 faulty /dev/hdk1
4 4 88 1 4 active sync /dev/hdm1
5 5 89 1 5 faulty /dev/hdo1
6 6 90 1 6 active sync /dev/hdq1
7 7 91 1 7 active sync /dev/hds1
/dev/hdk1 has at least one unreadable block around LBA 3,600,000 or so,
and /dev/hdo1 has at least one unreadable blok around LBA 8,000,000 or so.
Further, the array was resyncing (power failure due to construction, yes
its been one of those days - but it was actually in sync) when the first
bad block hit, but I know that all the data I care about was static at
the time, so barring some fsck cleanup, all the important blocks should
have correct parity.
Which is to say that I think my data exists, its just a bit far away at
the moment.
The first question is, would you agree?
Assuming its there, my general plan is to do this to get my data out:
1) resurrect the backup array
2) add one faulty drive to the array, with bad blocks there
(an mdadm assemble with 7 of the 8, forced?)
3) start the backup, fully anticipating the read error and disk ejection
4) add the other faulty drive in, with bad blocks there
(mdadm assemble with 7 of the 8, forced again?)
5) finish the backup
The second question is, does that sound sane? Or is there a better way?
Finally, to get the main array healthy, I'm going to take note of which
files kicked out which drives, and clobber them with the backed up version.
Alternately, how hard would it be to write a utility that inspected the
array, took the LBA(s) of the bad block on one component, and
reconstructed it for rewrite via parity. A very smart dd, in a way. Is
that possible?
Finally, I heard mention (fromo Peter Breuer I think) of a raid5 patch
that tolerates sector read errors and re-writes automagically. Any info
on that would be interesting.
Thanks for your time
-Mike
next reply other threads:[~2005-01-13 7:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-13 7:14 Mike Hardy [this message]
2005-01-13 8:37 ` the dreaded double disk failure Guy
2005-01-13 8:47 ` Mike Hardy
2005-01-15 3:59 ` Mike Hardy
2005-01-15 22:52 ` raid5 test array creator (was Re: the dreaded double disk failure) Mike Hardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41E61FC1.6060603@h3c.com \
--to=mhardy@h3c.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).