Re: When does a disk get flagged as bad?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alberto Alonso <alberto@ggsys.net>
To: linux-raid@vger.kernel.org
Subject: Re: When does a disk get flagged as bad?
Date: Wed, 30 May 2007 21:49:02 -0500	[thread overview]
Message-ID: <1180579742.20508.18.camel@w100> (raw)
In-Reply-To: <5822.1180578498@mdt.dhcp.pit.laurelnetworks.com>

On Wed, 2007-05-30 at 22:28 -0400, Mike Accetta wrote:
> Alberto Alonso writes:
> > OK, lets see if I can understand how a disk gets flagged
> > as bad and removed from an array. I was under the impression
> > that any read or write operation failure flags the drive as
> > bad and it gets removed automatically from the array.
> > 
> > However, as I indicated in a prior post I am having problems
> > where the array is never degraded. Does an error of type:
> > end_request: I/O error, dev sdb, sector ....
> > not count as a read/write error?
> 
> I was also under the impression that any read or write error would
> fail the drive out of the array but some recent experiments with error
> injecting seem to indicate otherwise at least for raid1.  My working
> hypothesis is that only write errors fail the drive.  Read errors appear
> to just redirect the sector to a different mirror.
> 
> I actually ran across what looks like a bug in the raid1
> recovery/check/repair read error logic that I posted about
> last week but which hasn't generated any response yet (cf.
> http://article.gmane.org/gmane.linux.raid/15354).  This bug results in
> sending a zero length write request down to the underlying device driver.
> A consequence of issuing a zero length write is that it fails at the
> device level, which raid1 sees as a write failure, which then fails the
> array.  The fix I proposed actually has the effect of *not* failing the
> array in this case since the spurious failing write is never generated.
> I'm not sure what is actually supposed to happen in this case.  Hopefully,
> someone more knowledgeable will comment soon.
> --
> Mike Accetta

I was starting to think that nobody got my posts, I know there
are plenty of people that understand raid and didn't get any answers
to any of my related posts.

After thinking about your post, I guess I can see some logic behind
not failing on the read, although I would say that after x amount of
read failures a drive should be kicked out no matter what.

In my case I believe the errors are during writes, which is still
confusing.

Unfortunately I've never done any kind of disk I/O code so I am
afraid of looking at the code and getting completely lost.

Alberto

next prev parent reply	other threads:[~2007-05-31  2:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-25  4:16 When does a disk get flagged as bad? Alberto Alonso
2007-05-31  2:28 ` Mike Accetta
2007-05-31  2:49   ` Alberto Alonso [this message]
2007-05-31  6:10     ` Neil Brown
2007-06-02  0:07       ` Bill Davidsen
2007-06-02 15:50       ` Alberto Alonso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1180579742.20508.18.camel@w100 \
    --to=alberto@ggsys.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.