From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Bellon Subject: Re: No response? Date: Thu, 20 Jan 2005 11:37:07 -0700 Message-ID: <41EFFA53.3030809@mvista.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Gordon Henderson Cc: David Dougall , linux-raid@vger.kernel.org List-Id: linux-raid.ids Gordon Henderson wrote: >On Thu, 20 Jan 2005, David Dougall wrote: > > > >>Perhaps I was asking a stupid question or an obvious one, but I have >>received not response. >>Maybe if I simplify the question... >> >>If I am running software raid1 and a disk device starts throwing I/O >>errors, Is the filesystem supposed to see any indication of this? >> >> > >No.. > > > >> I >>thought software raid would mask all of this and just fail the drive. >> >> > >It should. > > > >>I have servers with xfs as the filesystem and xfs will start to throw I/O >>errors when a disk starts acting up even with software raid in between. >>Please advise on how I can confirm my setup or if this is possibly a bug >>how to diagnose further. >> >> > >I've experienced long delays (30 seconds? It seemed longer) in a system >when a disk fails for a genuine reason - (I've deliberately run badblocks >on an md device when I knew one of the underlying devices had genuine bad >blocks) maybe the md code really tries hard to read the block, maybe the >underlying device driver tries really hard), but in these cases, I've seen >the system more or less freeze (all processes accessing that device >anyway) until the raid code decided to kick the device out of the array. > > I've seen this too. The worst case can actually last for over 2 minutes. We've been running with a patch to the RAID 1 driver that handles this so critical applications do not hang for too long. Basically it uses timers in the RAID 1 driver to force the disk to be treated as actually having failed if it doesn't respond within a reasonable time (tunable but usually ~3 seconds). It then handles the I/O requests coming back async. and does the clean up. >Maybe XFS has a timer and doesn't like devices to "go away" for a long period of time? > > Not that I know of but I would need to look. Any XFS wizard's comments? mark > > >>If it makes a difference, I am running linux-2.4.26 >> >> > >I've used 2.4.x for a long time - I did try xfs about a year ago, but >wasn't happy with it all (for various reasons). > >Gordon >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >