From: Alberto Alonso <alberto@ggsys.net>
To: Doug Ledford <dledford@redhat.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Implementing low level timeouts within MD
Date: Sun, 28 Oct 2007 01:27:02 -0500 [thread overview]
Message-ID: <1193552822.6541.8.camel@w100> (raw)
In-Reply-To: <1193529329.10336.366.camel@firewall.xsintricity.com>
On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote:
> On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
> > Regardless of the fact that it is not MD's fault, it does make
> > software raid an invalid choice when combined with those drivers. A
> > single disk failure within a RAID5 array bringing a file server down
> > is not a valid option under most situations.
>
> Without knowing the exact controller you have and driver you use, I
> certainly can't tell the situation. However, I will note that there are
> times when no matter how well the driver is written, the wrong type of
> drive failure *will* take down the entire machine. For example, on an
> SPI SCSI bus, a single drive failure that involves a blown terminator
> will cause the electrical signaling on the bus to go dead no matter what
> the driver does to try and work around it.
Sorry I thought I copied the list with the info that I sent to Richard.
Here is the main hardware combinations.
--- Excerpt Start ----
Certainly. The times when I had good results (ie. failed drives
with properly degraded arrays have been with old PATA based IDE
controllers built in the motherboard and the Highpoint PATA
cards). The failures (ie. single disk failure bringing the whole
server down) have been with the following:
* External disks on USB enclosures, both RAID1 and RAID5 (two different
systems) Don't know the actual controller for these. I assume it is
related to usb-storage, but can probably research the actual chipset,
if it is needed.
* Internal serverworks PATA controller on a netengine server. The
server if off waiting to get picked up, so I can't get the important
details.
* Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3
disks each. (only one drive on one array went bad)
* VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
* And the most complex is this week's server with 4 PCI/PCI-X cards.
But the one that hanged the server was a 4 disk RAID5 array on a
RocketRAID1540 card.
--- Excerpt End ----
>
> > I wasn't even asking as to whether or not it should, I was asking if
> > it could.
>
> It could, but without careful control of timeouts for differing types of
> devices, you could end up making the software raid less reliable instead
> of more reliable overall.
Even if the default timeout was really long (ie. 1 minute) and then
configurable on a per device (or class) via /proc it would really help.
> Generally speaking, most modern drivers will work well. It's easier to
> maintain a list of known bad drivers than known good drivers.
That's what has been so frustrating. The old PATA IDE hardware always
worked and the new stuff is what has crashed.
> Be careful which hardware raid you choose, as in the past several brands
> have been known to have the exact same problem you are having with
> software raid, so you may not end up buying yourself anything. (I'm not
> naming names because it's been long enough since I paid attention to
> hardware raid driver issues that the issues I knew of could have been
> solved by now and I don't want to improperly accuse a currently well
> working driver of being broken)
I have settled for 3ware. All my tests showed that it performed quite
well and kicked drives out when needed. Of course, I haven't had a
bad drive on a 3ware production server yet, so.... I may end up
pulling the little bit of hair I have left.
I am now rushing the RocketRAID 2220 into production without testing
due to it being the only thing I could get my hands on. I'll report
any experiences as they happen.
Thanks for all the info,
Alberto
next prev parent reply other threads:[~2007-10-28 6:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
2007-10-26 19:00 ` Doug Ledford
2007-10-27 10:33 ` Samuel Tardieu
2007-10-30 5:19 ` Alberto Alonso
2007-10-30 17:39 ` Doug Ledford
2007-11-01 5:08 ` Alberto Alonso
2007-11-01 14:14 ` Bill Davidsen
2007-11-01 19:16 ` Doug Ledford
2007-11-02 8:41 ` Alberto Alonso
2007-11-02 11:09 ` David Greaves
2007-11-02 17:47 ` Alberto Alonso
2007-11-02 12:44 ` Bill Davidsen
2007-11-02 15:45 ` Doug Ledford
2007-11-02 18:21 ` Alberto Alonso
2007-11-02 19:15 ` Doug Ledford
2007-11-02 21:24 ` Alberto Alonso
2007-10-27 21:46 ` Alberto Alonso
2007-10-27 23:55 ` Doug Ledford
2007-10-28 6:27 ` Alberto Alonso [this message]
2007-10-29 17:22 ` Doug Ledford
2007-10-30 5:08 ` Alberto Alonso
2007-10-30 12:12 ` Gabor Gombas
2007-10-30 17:58 ` Doug Ledford
2007-11-01 14:19 ` Bill Davidsen
2007-11-07 8:47 ` Goswin von Brederlow
2007-10-27 18:59 ` Richard Scobie
[not found] ` <1193522726.7690.31.camel@w100>
2007-10-27 23:46 ` Richard Scobie
2007-10-30 4:47 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1193552822.6541.8.camel@w100 \
--to=alberto@ggsys.net \
--cc=dledford@redhat.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).