From: Martin Cracauer <cracauer@cons.org>
To: Neil Brown <neilb@suse.de>
Cc: Chris Allen <chris@cjx.com>, linux-raid@vger.kernel.org
Subject: Re: making raid5 more robust after a crash?
Date: Mon, 20 Mar 2006 12:41:59 -0500 [thread overview]
Message-ID: <20060320124159.A50675@cons.org> (raw)
In-Reply-To: <17435.9868.249413.113639@cse.unsw.edu.au>; from neilb@suse.de on Sat, Mar 18, 2006 at 08:13:48AM +1100
Neil Brown wrote on Sat, Mar 18, 2006 at 08:13:48AM +1100:
> On Friday March 17, chris@cjx.com wrote:
> > Dear All,
> >
> > We have a number of machines running 4TB raid5 arrays.
> > Occasionally one of these machines will lock up solid and
> > will need power cycling. Often when this happens, the
> > array will refuse to restart with 'cannot start dirty
> > degraded array'. Usually mdadm --assemble --force will
> > get the thing going again - although it will then do
> > a complete resync.
First of all you need to make sure you can see the kernel messages
from this. If /var/log/messages lives on the array affected you won't
see messages explaining what happens even if the kernel printed them.
What you see here is probably similar to a problem I just had: by
using software RAID you are subject to errors below the RAID level
that are not disk errors. In my case a BIOS problem on my board made
the SATA driver run out of space, on requests for two of the disks on
my RAID-5, simultaneously. The driver had to report an error upstream
and the RAID software on top of it cannot tell such a non-disk error
from a disk error. It treats everything as a disk error and drops the
disk out of the array because it has seen errors on requests for two
disks.
I have more info on my accident here:
http://forums.2cpu.com/showthread.php?t=73705
As I said, you need to have a logfile on a disk not in the array, or
(better) you need to be able to watch kernel messages on the console
when this happens.
It sounds to me you have a similar problem to what I had: a software
error above the disks but below the raid level.
> >
> >
> > My question is: Is there any way I can make the array
> > more robust? I don't mind it losing a single drive and
> > having to resync when we get a lockup - but having to
> > do a forced assemble always makes me nervous, and means
> > that this sort of crash has to be escalated to a senior
> > engineer.
The re-sync is actually a big problem because actually losing a drive
physically during the re-sync will kill your array (unless it is the
re-syncing disk).
Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org> http://www.cons.org/cracauer/
FreeBSD - where you want to go, today. http://www.freebsd.org/
next prev parent reply other threads:[~2006-03-20 17:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-17 13:02 making raid5 more robust after a crash? Chris Allen
2006-03-17 21:13 ` Neil Brown
2006-03-20 17:41 ` Martin Cracauer [this message]
2006-03-29 13:19 ` Chris Allen
2006-03-29 22:17 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060320124159.A50675@cons.org \
--to=cracauer@cons.org \
--cc=chris@cjx.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).