linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Doug Ledford <dledford@redhat.com>
To: Alberto Alonso <alberto@ggsys.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: Implementing low level timeouts within MD
Date: Sat, 27 Oct 2007 19:55:29 -0400	[thread overview]
Message-ID: <1193529329.10336.366.camel@firewall.xsintricity.com> (raw)
In-Reply-To: <1193521561.7690.14.camel@w100>

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
> On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote:
> > 
> > This isn't an md problem, this is a low level disk driver problem.  Yell
> > at the author of the disk driver in question.  If that driver doesn't
> > time things out and return errors up the stack in a reasonable time,
> > then it's broken.  Md should not, and realistically can not, take the
> > place of a properly written low level driver.
> > 
> 
> I am not arguing whether or not MD is at fault, I know it isn't. 
> 
> Regardless of the fact that it is not MD's fault, it does make
> software raid an invalid choice when combined with those drivers. A
> single disk failure within a RAID5 array bringing a file server down
> is not a valid option under most situations.

Without knowing the exact controller you have and driver you use, I
certainly can't tell the situation.  However, I will note that there are
times when no matter how well the driver is written, the wrong type of
drive failure *will* take down the entire machine.  For example, on an
SPI SCSI bus, a single drive failure that involves a blown terminator
will cause the electrical signaling on the bus to go dead no matter what
the driver does to try and work around it.

> I wasn't even asking as to whether or not it should, I was asking if
> it could.

It could, but without careful control of timeouts for differing types of
devices, you could end up making the software raid less reliable instead
of more reliable overall.

>  Should is a relative term, could is not. If the MD code
> can not cope with poorly written drivers then a list of valid drivers
> and cards would be nice to have (that's why I posted my ... when it
> works and when it doesn't, I was trying to come up with such a list).

Generally speaking, most modern drivers will work well.  It's easier to
maintain a list of known bad drivers than known good drivers.

> I only got 1 answer with brand specific information to figure out when
> it works and when it doesn't work. My recent experience is that too
> many drivers seem to have the problem so software raid is no longer
> an option for any new systems that I build, and as time and money
> permits I'll be switching to hardware/firmware raid all my legacy
> servers.

Be careful which hardware raid you choose, as in the past several brands
have been known to have the exact same problem you are having with
software raid, so you may not end up buying yourself anything.  (I'm not
naming names because it's been long enough since I paid attention to
hardware raid driver issues that the issues I knew of could have been
solved by now and I don't want to improperly accuse a currently well
working driver of being broken)

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2007-10-27 23:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
2007-10-26 19:00 ` Doug Ledford
2007-10-27 10:33   ` Samuel Tardieu
2007-10-30  5:19     ` Alberto Alonso
2007-10-30 17:39       ` Doug Ledford
2007-11-01  5:08         ` Alberto Alonso
2007-11-01 14:14           ` Bill Davidsen
2007-11-01 19:16           ` Doug Ledford
2007-11-02  8:41             ` Alberto Alonso
2007-11-02 11:09               ` David Greaves
2007-11-02 17:47                 ` Alberto Alonso
2007-11-02 12:44               ` Bill Davidsen
2007-11-02 15:45               ` Doug Ledford
2007-11-02 18:21                 ` Alberto Alonso
2007-11-02 19:15                   ` Doug Ledford
2007-11-02 21:24                     ` Alberto Alonso
2007-10-27 21:46   ` Alberto Alonso
2007-10-27 23:55     ` Doug Ledford [this message]
2007-10-28  6:27       ` Alberto Alonso
2007-10-29 17:22         ` Doug Ledford
2007-10-30  5:08           ` Alberto Alonso
2007-10-30 12:12             ` Gabor Gombas
2007-10-30 17:58             ` Doug Ledford
2007-11-01 14:19             ` Bill Davidsen
2007-11-07  8:47           ` Goswin von Brederlow
2007-10-27 18:59 ` Richard Scobie
     [not found]   ` <1193522726.7690.31.camel@w100>
2007-10-27 23:46     ` Richard Scobie
2007-10-30  4:47 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1193529329.10336.366.camel@firewall.xsintricity.com \
    --to=dledford@redhat.com \
    --cc=alberto@ggsys.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).