From: Paul Clements <Paul.Clements@SteelEye.com>
To: jim@rubylane.com
Cc: Andreas Kahnt <aka@aka.coware.de>, linux-raid@vger.kernel.org
Subject: Re: Reduce Timeout on Disk Failure
Date: Tue, 29 Apr 2003 10:06:14 -0400 [thread overview]
Message-ID: <3EAE86D6.D4DC56B2@SteelEye.com> (raw)
In-Reply-To: 20030429132329.20854.qmail@rome.rubylane.com
jim@rubylane.com wrote:
>
> If this is patched, I hope it is also put into a 2.2 update. When a
> SW raid is running, a couple of I/O retries might be reasonable, but
> not heroic recovery attempts that would make good sense in a
> single-disk environment.
Yes, the md driver in 2.2 had a ridiculously large retry loop when an
I/O failure occurs...if I counted correctly, I think it did 4096 retries
on I/O failure! This usually means that one of the lower level drivers
ends up hung in a pretty tight error handling loop...
> We did a simple test of powering down an IDE drive that was part of an
> (idle) SW raid, then trying to access the filesystem, and the system
> just locked up. Maybe it would have eventually come back to life - I
> dunno.
Yep, we tried similar things with a network block device (breaking the
network connection)...we ended up hacking the raid1 and nbd drivers and
inserting schedule() calls just to mitigate the effects of the retries a
little bit...we at least got the system not to hang completely while the
retries were going on...
> For the curious, we haven't upgraded to 2.4x because whenever I check
> the kernel traffic page, it seems there are still important bugs being
> found and corrected - ones we don't want to experience in a production
> setup.
Well, this particular retry problem does not exist in 2.4. And in
general, as far as software RAID is concerned, 2.4 is a lot better...I
know, at least with raid1, you can fail a device just about anytime you
want (with lots of write activity, during a resync, etc.) and as often
as you want, and it doesn't hang...
--
Paul
next prev parent reply other threads:[~2003-04-29 14:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-29 11:04 Reduce Timeout on Disk Failure Andreas Kahnt
2003-04-29 13:23 ` jim
2003-04-29 14:06 ` Paul Clements [this message]
2003-04-29 14:18 ` Lars Marowsky-Bree
2003-05-01 22:09 ` About bad sectors 3tcdgwg3
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3EAE86D6.D4DC56B2@SteelEye.com \
--to=paul.clements@steeleye.com \
--cc=aka@aka.coware.de \
--cc=jim@rubylane.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).