From: Tejun Heo <htejun@gmail.com>
To: Mark Lord <mlord@pobox.com>
Cc: Ric Wheeler <ric@emc.com>, Linux-ide <linux-ide@vger.kernel.org>,
Jeff Garzik <jgarzik@pobox.com>
Subject: Re: faulty disk testing
Date: Tue, 05 Sep 2006 16:08:40 +0200 [thread overview]
Message-ID: <44FD84E8.8000705@gmail.com> (raw)
In-Reply-To: <44FD803B.3040000@pobox.com>
Hello, Mark.
Mark Lord wrote:
> Sure it does. It can determine the number of consecutive failures on
> the same drive/channel, and it can also count intervening successes, if any.
>
> From that, at a minimum, it could notice that the same drive has gone 'round
> the error treadmill (say) 20 times in a row, with no other I/O possible on it
> because it has yet to successfully complete the reset+reinit phase.
If a device fails reset+reinit phase a few times, libata surely drops
the device, but I don't think the kernel can drop a device because it
failed, say, 20 consecutive IO commands when it can respond to reset and
reinit. That's where policy needs to come in, IMHO.
For Ric's case, I'm waiting for more info. If EH is looping forever
without reporting to upper layer, it definitely needs fixing, but I
don't think that's the case.
> Such a drive is a candidate for pushing the error upstairs,
> and possibly for getting offlined.
>
> Fancier fault-handling is also possible, but the bare minimum is that we
> must not get stuck forever looping in the EH code. Eventually a failed
> status
> has to be returned to the layers above, I think.
Error is always pushed upstairs. libata itself doesn't initiate any
kind of retrials. That's upto high level driver - in this case, sd.
One of the problems is that currently libata EH can take some minutes
recovering from an error condition. With partial request retry from sd,
a batch of consecutive bad sectors can make recovery take a really
long time. This needs fixing.
Thanks.
--
tejun
next prev parent reply other threads:[~2006-09-05 14:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-05 1:30 faulty disk testing Ric Wheeler
2006-09-05 11:57 ` Tejun Heo
2006-09-05 12:46 ` Ric Wheeler
2006-09-05 13:48 ` Mark Lord
2006-09-05 14:08 ` Tejun Heo [this message]
2006-09-05 14:15 ` Mark Lord
2006-09-05 14:45 ` Tejun Heo
2006-09-05 14:19 ` Ric Wheeler
2006-09-05 14:56 ` Tejun Heo
2006-09-05 15:48 ` Ric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44FD84E8.8000705@gmail.com \
--to=htejun@gmail.com \
--cc=jgarzik@pobox.com \
--cc=linux-ide@vger.kernel.org \
--cc=mlord@pobox.com \
--cc=ric@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).