Re: Reduce Timeout on Disk Failure

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Clements <Paul.Clements@SteelEye.com>
To: jim@rubylane.com
Cc: Andreas Kahnt <aka@aka.coware.de>, linux-raid@vger.kernel.org
Subject: Re: Reduce Timeout on Disk Failure
Date: Tue, 29 Apr 2003 10:06:14 -0400	[thread overview]
Message-ID: <3EAE86D6.D4DC56B2@SteelEye.com> (raw)
In-Reply-To: 20030429132329.20854.qmail@rome.rubylane.com

jim@rubylane.com wrote:
> 
> If this is patched, I hope it is also put into a 2.2 update.  When a
> SW raid is running, a couple of I/O retries might be reasonable, but
> not heroic recovery attempts that would make good sense in a
> single-disk environment.

Yes, the md driver in 2.2 had a ridiculously large retry loop when an
I/O failure occurs...if I counted correctly, I think it did 4096 retries
on I/O failure! This usually means that one of the lower level drivers
ends up hung in a pretty tight error handling loop...

> We did a simple test of powering down an IDE drive that was part of an
> (idle) SW raid, then trying to access the filesystem, and the system
> just locked up.  Maybe it would have eventually come back to life - I
> dunno.

Yep, we tried similar things with a network block device (breaking the
network connection)...we ended up hacking the raid1 and nbd drivers and
inserting schedule() calls just to mitigate the effects of the retries a
little bit...we at least got the system not to hang completely while the
retries were going on... 

> For the curious, we haven't upgraded to 2.4x because whenever I check
> the kernel traffic page, it seems there are still important bugs being
> found and corrected - ones we don't want to experience in a production
> setup.

Well, this particular retry problem does not exist in 2.4. And in
general, as far as software RAID is concerned, 2.4 is a lot better...I
know, at least with raid1, you can fail a device just about anytime you
want (with lots of write activity, during a resync, etc.) and as often
as you want, and it doesn't hang...

--
Paul

next prev parent reply	other threads:[~2003-04-29 14:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-29 11:04 Reduce Timeout on Disk Failure Andreas Kahnt
2003-04-29 13:23 ` jim
2003-04-29 14:06   ` Paul Clements [this message]
2003-04-29 14:18     ` Lars Marowsky-Bree
2003-05-01 22:09       ` About bad sectors 3tcdgwg3

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EAE86D6.D4DC56B2@SteelEye.com \
    --to=paul.clements@steeleye.com \
    --cc=aka@aka.coware.de \
    --cc=jim@rubylane.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).