Re: raid1 error handling and faulty drives

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Accetta <maccetta@laurelnetworks.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid1 error handling and faulty drives
Date: Fri, 07 Sep 2007 18:03:43 -0400	[thread overview]
Message-ID: <22375.1189202623@mdt.ecitele.com> (raw)
In-Reply-To: Your message of "Thu, 06 Sep 2007 06:23:42 BST." <18143.36574.398690.36732@notabene.brown>

Neil Brown writes:

> On Wednesday September 5, maccetta@laurelnetworks.com wrote:
> > ...
> > 
> > 2) It adds a threshold on the level of recent error acivity which is
> >    acceptable in a given interval, all configured through /sys.  If a
> >    mirror has generated more errors in this interval than the threshold,
> >    it is kicked out of the array.
> 
> This is probably a good idea.  It bothers me a little to require 2
> separate numbers in sysfs...
> 
> When we get a read error, we quiesce the device, the try to sort out
> the read errors, so we effectively handle them in batches.  Maybe we
> should just set a number of seconds, and if there are a 3 or more
> batches in that number of seconds, we kick the drive... just a thought.
> 

I think I was just trying to be as flexible as possible.  If we were to
use one number, I'd do the opposite and fix the interval but allow the
threshold to be configured just because I would tend to think about a
disk being bad in terms of it having a "more than" an expected number of
errors in some fixed interval rather than because it had a fixed number
of errors in "less than" some expected interval.  Mathematically the
approaches ought to be equivalent.

> > One would think that #2 should not be necessary as the raid1 retry
> > logic already attempts to rewrite and then reread bad sectors and fails
> > the drive if it cannot do both.  However, what we observe is that the
> > re-write step succeeds as does the re-read but the drive is really no
> > more healthy.  Maybe the re-read is not actually going out to the media
> > in this case due to some caching effect?
> 
> I have occasionally wondered if a cache would defeat this test.  I
> wonder if we can push a "FORCE MEDIA ACCESS" flag down with that
> read.  I'll ask.

I looked around for something like this but it doesn't appear to
be implemented that I could see.  I couldn't even find an explicit
mention of read caching in any drive specs to begin with.  Read-ahead
seems to be the closest concept.

> Thanks.  I agree that we do need something along these lines.  It
> might be a while before I can give the patch the brainspace it
> deserves as I am travelling this fortnight.

Looking forward to further discussion.  Thank you!
--
Mike Accetta

ECI Telecom Ltd.
Transport Networking Division, US (previously Laurel Networks)

next prev parent reply	other threads:[~2007-09-07 22:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-05 22:22 raid1 error handling and faulty drives Mike Accetta
2007-09-06  5:23 ` Neil Brown
2007-09-07 22:03   ` Mike Accetta [this message]
2008-02-26 16:23   ` Philip Molter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22375.1189202623@mdt.ecitele.com \
    --to=maccetta@laurelnetworks.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).