From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bill Davidsen Subject: Re: Software RAID when it works and when it doesn't Date: Wed, 24 Oct 2007 16:04:19 -0400 Message-ID: <471FA543.9090502@tmr.com> References: <14526.1192571833@mdt.ecitele.com> <87bqaw5tqb.fsf@informatik.uni-tuebingen.de> <1192777672.16416.495.camel@w100> <471E79A5.5020607@tmr.com> <1193205003.23414.72.camel@w100> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1193205003.23414.72.camel@w100> Sender: linux-raid-owner@vger.kernel.org To: Alberto Alonso Cc: Goswin von Brederlow , Mike Accetta , Neil Brown , linux-raid@vger.kernel.org List-Id: linux-raid.ids Alberto Alonso wrote: > On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: > > >> I'm not sure the timeouts are the problem, even if md did its own >> timeout, it then needs a way to tell the driver (or device) to stop >> retrying. I don't believe that's available, certainly not everywhere, >> and anything other than everywhere would turn the md code into a nest of >> exceptions. >> >> > > If we loose the ability to communication to that drive I don't see it > as a problem (that's the whole point, we kick it out of the array). So, > if we can't tell the driver about the failure we are still OK, md could > successfully deal with misbehaved drivers. I think what you really want is to notice how long the drive and driver took to recover or fail, and take action based on that. In general "kick the drive" is not optimal for a few bad spots, even if the drive recovery sucks. -- bill davidsen CTO TMR Associates, Inc Doing interesting things with small computers since 1979