linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Bellon <mbellon@mvista.com>
To: David Dougall <davidd@et.byu.edu>
Cc: Gordon Henderson <gordon@drogon.net>, linux-raid@vger.kernel.org
Subject: Re: No response?
Date: Thu, 20 Jan 2005 12:35:29 -0700	[thread overview]
Message-ID: <41F00801.2050807@mvista.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0501201205490.19586@lewis.et.byu.edu>

David Dougall wrote:

>Oooh, that ~3 second patch sounds very interesting.  I actually think that
>the theory about timeouts causing the problem is correct.  I didn't
>realize that applications/fs calls could stall for that long.  My NFS
>servers have a timeout themselves of about 10 seconds before they start to
>try to shut things down.
>  
>
I could generate one for 2.4.26 for you but I need a bit of time - I'm 
running a 2.4.20 with a great many enhancements and there are a few 
differences. If there is interested I can post it to linux-raid too.

mark

>--David Dougall
>
>
>On Thu, 20 Jan 2005, Mark Bellon wrote:
>
>  
>
>>Gordon Henderson wrote:
>>
>>    
>>
>>>On Thu, 20 Jan 2005, David Dougall wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>Perhaps I was asking a stupid question or an obvious one, but I have
>>>>received not response.
>>>>Maybe if I simplify the question...
>>>>
>>>>If I am running software raid1 and a disk device starts throwing I/O
>>>>errors, Is the filesystem supposed to see any indication of this?
>>>>
>>>>
>>>>        
>>>>
>>>No..
>>>
>>>
>>>
>>>      
>>>
>>>>I
>>>>thought software raid would mask all of this and just fail the drive.
>>>>
>>>>
>>>>        
>>>>
>>>It should.
>>>
>>>
>>>
>>>      
>>>
>>>>I have servers with xfs as the filesystem and xfs will start to throw I/O
>>>>errors when a disk starts acting up even with software raid in between.
>>>>Please advise on how I can confirm my setup or if this is possibly a bug
>>>>how to diagnose further.
>>>>
>>>>
>>>>        
>>>>
>>>I've experienced long delays (30 seconds? It seemed longer) in a system
>>>when a disk fails for a genuine reason - (I've deliberately run badblocks
>>>on an md device when I knew one of the underlying devices had genuine bad
>>>blocks) maybe the md code really tries hard to read the block, maybe the
>>>underlying device driver tries really hard), but in these cases, I've seen
>>>the system more or less freeze (all processes accessing that device
>>>anyway) until the raid code decided to kick the device out of the array.
>>>
>>>
>>>      
>>>
>>I've seen this too. The worst case can actually last for over 2 minutes.
>>
>>We've been running with a patch to the RAID 1 driver that handles this
>>so critical applications do not hang for too long. Basically it uses
>>timers in the RAID 1 driver to force the disk to be treated as actually
>>having failed if it doesn't respond within a reasonable time (tunable
>>but usually ~3 seconds). It then handles the I/O requests coming back
>>async. and does the clean up.
>>
>>    
>>
>>>Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time?
>>>
>>>
>>>      
>>>
>>Not that I know of but I would need to look. Any XFS wizard's comments?
>>
>>mark
>>
>>    
>>
>>>      
>>>
>>>>If it makes a difference, I am running linux-2.4.26
>>>>
>>>>
>>>>        
>>>>
>>>I've used 2.4.x for a long time - I did try xfs about a year ago, but
>>>wasn't happy with it all (for various reasons).
>>>
>>>Gordon
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>      
>>>
>>
>>    
>>


  reply	other threads:[~2005-01-20 19:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-20 17:55 No response? David Dougall
2005-01-20 18:12 ` Peter T. Breuer
2005-01-20 18:14 ` Gordon Henderson
2005-01-20 18:37   ` Mark Bellon
2005-01-20 19:15     ` David Dougall
2005-01-20 19:35       ` Mark Bellon [this message]
2005-01-20 19:37     ` Gordon Henderson
2005-01-20 19:41       ` Mark Bellon
2005-01-20 19:49         ` David Dougall
2005-01-20 18:21 ` Mike Hardy
2005-01-20 18:30 ` Mario Holbe
2005-01-20 18:57   ` David Dougall
2005-01-20 19:12     ` Kanoa Withington
2005-01-20 19:17       ` David Dougall
2005-01-20 19:23         ` Guy
2005-01-20 19:34         ` Kanoa Withington
2005-01-20 19:44           ` Mark Bellon
2005-01-20 19:18     ` Guy
2005-01-20 19:24     ` Peter T. Breuer
2005-01-20 19:51       ` David Dougall
2005-01-20 19:28     ` Mark Bellon
2005-01-20 18:49 ` Kanoa Withington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41F00801.2050807@mvista.com \
    --to=mbellon@mvista.com \
    --cc=davidd@et.byu.edu \
    --cc=gordon@drogon.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).