From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roger Heflin Subject: Re: raid failure question Date: Tue, 12 Jan 2010 06:08:15 -0600 Message-ID: <4B4C662F.3010305@gmail.com> References: <1263232840.8962.193.camel@kije> <20100111205332.GA24486@cthulhu.home.robinhill.me.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20100111205332.GA24486@cthulhu.home.robinhill.me.uk> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Robin Hill wrote: > On Mon Jan 11, 2010 at 11:00:40AM -0700, Tim Bock wrote: > >> Hello, >> >> Excluding the obvious multi-disk or bus failures, can anyone describe >> what type of disk failure a raid cannot detect/recover from? >> >> I have had two disk failures over the last three months, and in spite of >> having a hot spare, manual intervention was required each time to make >> the raid usable again. I'm just not sure if I'm not setting something >> up right, or if there is some other issue. >> >> Thanks for any comments or suggestions. >> > Any failure where the disk doesn't actually return an error (within a > reasonable time). For example, consumer grade disks often have very > long retry times - this can mean the array in unusable for a long time > until the disk eventually fails the read. > > If the disk actually returns an error then, AFAIK, the RAID array should > always be able to recover from it. > > Cheers, > Robin The OS will time the disk out at about 30 seconds if it does not answer, and then the disk gets treated as "BAD". On fiber channel this is a fairly common type of failure, if something fails in the fabric such that the disk can no longer talk to the machine.