From: Mark Bellon <mbellon@mvista.com>
To: David Dougall <davidd@et.byu.edu>
Cc: Gordon Henderson <gordon@drogon.net>, linux-raid@vger.kernel.org
Subject: Re: No response?
Date: Thu, 20 Jan 2005 12:35:29 -0700 [thread overview]
Message-ID: <41F00801.2050807@mvista.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0501201205490.19586@lewis.et.byu.edu>
David Dougall wrote:
>Oooh, that ~3 second patch sounds very interesting. I actually think that
>the theory about timeouts causing the problem is correct. I didn't
>realize that applications/fs calls could stall for that long. My NFS
>servers have a timeout themselves of about 10 seconds before they start to
>try to shut things down.
>
>
I could generate one for 2.4.26 for you but I need a bit of time - I'm
running a 2.4.20 with a great many enhancements and there are a few
differences. If there is interested I can post it to linux-raid too.
mark
>--David Dougall
>
>
>On Thu, 20 Jan 2005, Mark Bellon wrote:
>
>
>
>>Gordon Henderson wrote:
>>
>>
>>
>>>On Thu, 20 Jan 2005, David Dougall wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Perhaps I was asking a stupid question or an obvious one, but I have
>>>>received not response.
>>>>Maybe if I simplify the question...
>>>>
>>>>If I am running software raid1 and a disk device starts throwing I/O
>>>>errors, Is the filesystem supposed to see any indication of this?
>>>>
>>>>
>>>>
>>>>
>>>No..
>>>
>>>
>>>
>>>
>>>
>>>>I
>>>>thought software raid would mask all of this and just fail the drive.
>>>>
>>>>
>>>>
>>>>
>>>It should.
>>>
>>>
>>>
>>>
>>>
>>>>I have servers with xfs as the filesystem and xfs will start to throw I/O
>>>>errors when a disk starts acting up even with software raid in between.
>>>>Please advise on how I can confirm my setup or if this is possibly a bug
>>>>how to diagnose further.
>>>>
>>>>
>>>>
>>>>
>>>I've experienced long delays (30 seconds? It seemed longer) in a system
>>>when a disk fails for a genuine reason - (I've deliberately run badblocks
>>>on an md device when I knew one of the underlying devices had genuine bad
>>>blocks) maybe the md code really tries hard to read the block, maybe the
>>>underlying device driver tries really hard), but in these cases, I've seen
>>>the system more or less freeze (all processes accessing that device
>>>anyway) until the raid code decided to kick the device out of the array.
>>>
>>>
>>>
>>>
>>I've seen this too. The worst case can actually last for over 2 minutes.
>>
>>We've been running with a patch to the RAID 1 driver that handles this
>>so critical applications do not hang for too long. Basically it uses
>>timers in the RAID 1 driver to force the disk to be treated as actually
>>having failed if it doesn't respond within a reasonable time (tunable
>>but usually ~3 seconds). It then handles the I/O requests coming back
>>async. and does the clean up.
>>
>>
>>
>>>Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time?
>>>
>>>
>>>
>>>
>>Not that I know of but I would need to look. Any XFS wizard's comments?
>>
>>mark
>>
>>
>>
>>>
>>>
>>>>If it makes a difference, I am running linux-2.4.26
>>>>
>>>>
>>>>
>>>>
>>>I've used 2.4.x for a long time - I did try xfs about a year ago, but
>>>wasn't happy with it all (for various reasons).
>>>
>>>Gordon
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>
>>
>>
next prev parent reply other threads:[~2005-01-20 19:35 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-20 17:55 No response? David Dougall
2005-01-20 18:12 ` Peter T. Breuer
2005-01-20 18:14 ` Gordon Henderson
2005-01-20 18:37 ` Mark Bellon
2005-01-20 19:15 ` David Dougall
2005-01-20 19:35 ` Mark Bellon [this message]
2005-01-20 19:37 ` Gordon Henderson
2005-01-20 19:41 ` Mark Bellon
2005-01-20 19:49 ` David Dougall
2005-01-20 18:21 ` Mike Hardy
2005-01-20 18:30 ` Mario Holbe
2005-01-20 18:57 ` David Dougall
2005-01-20 19:12 ` Kanoa Withington
2005-01-20 19:17 ` David Dougall
2005-01-20 19:23 ` Guy
2005-01-20 19:34 ` Kanoa Withington
2005-01-20 19:44 ` Mark Bellon
2005-01-20 19:18 ` Guy
2005-01-20 19:24 ` Peter T. Breuer
2005-01-20 19:51 ` David Dougall
2005-01-20 19:28 ` Mark Bellon
2005-01-20 18:49 ` Kanoa Withington
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41F00801.2050807@mvista.com \
--to=mbellon@mvista.com \
--cc=davidd@et.byu.edu \
--cc=gordon@drogon.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.