From: Alberto Alonso <alberto@ggsys.net>
To: linux-raid@vger.kernel.org
Subject: Re: When does a disk get flagged as bad?
Date: Wed, 30 May 2007 21:49:02 -0500 [thread overview]
Message-ID: <1180579742.20508.18.camel@w100> (raw)
In-Reply-To: <5822.1180578498@mdt.dhcp.pit.laurelnetworks.com>
On Wed, 2007-05-30 at 22:28 -0400, Mike Accetta wrote:
> Alberto Alonso writes:
> > OK, lets see if I can understand how a disk gets flagged
> > as bad and removed from an array. I was under the impression
> > that any read or write operation failure flags the drive as
> > bad and it gets removed automatically from the array.
> >
> > However, as I indicated in a prior post I am having problems
> > where the array is never degraded. Does an error of type:
> > end_request: I/O error, dev sdb, sector ....
> > not count as a read/write error?
>
> I was also under the impression that any read or write error would
> fail the drive out of the array but some recent experiments with error
> injecting seem to indicate otherwise at least for raid1. My working
> hypothesis is that only write errors fail the drive. Read errors appear
> to just redirect the sector to a different mirror.
>
> I actually ran across what looks like a bug in the raid1
> recovery/check/repair read error logic that I posted about
> last week but which hasn't generated any response yet (cf.
> http://article.gmane.org/gmane.linux.raid/15354). This bug results in
> sending a zero length write request down to the underlying device driver.
> A consequence of issuing a zero length write is that it fails at the
> device level, which raid1 sees as a write failure, which then fails the
> array. The fix I proposed actually has the effect of *not* failing the
> array in this case since the spurious failing write is never generated.
> I'm not sure what is actually supposed to happen in this case. Hopefully,
> someone more knowledgeable will comment soon.
> --
> Mike Accetta
I was starting to think that nobody got my posts, I know there
are plenty of people that understand raid and didn't get any answers
to any of my related posts.
After thinking about your post, I guess I can see some logic behind
not failing on the read, although I would say that after x amount of
read failures a drive should be kicked out no matter what.
In my case I believe the errors are during writes, which is still
confusing.
Unfortunately I've never done any kind of disk I/O code so I am
afraid of looking at the code and getting completely lost.
Alberto
next prev parent reply other threads:[~2007-05-31 2:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-25 4:16 When does a disk get flagged as bad? Alberto Alonso
2007-05-31 2:28 ` Mike Accetta
2007-05-31 2:49 ` Alberto Alonso [this message]
2007-05-31 6:10 ` Neil Brown
2007-06-02 0:07 ` Bill Davidsen
2007-06-02 15:50 ` Alberto Alonso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1180579742.20508.18.camel@w100 \
--to=alberto@ggsys.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).