Implementing low level timeouts within MD

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Implementing low level timeouts within MD
@ 2007-10-26 17:12 Alberto Alonso
  2007-10-26 19:00 ` Doug Ledford
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-10-26 17:12 UTC (permalink / raw)
  To: linux-raid

I've been asking on my other posts but haven't seen
a direct reply to this question:

Can MD implement timeouts so that it detects problems when
drivers don't come back?

For me this year shall be known as "the year the array
stood still" (bad scifi reference :-)

After 4 different array failures all due to a single drive
failure I think it would really be helpful if the md code
timed out the driver.

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
@ 2007-10-26 19:00 ` Doug Ledford
  2007-10-27 10:33   ` Samuel Tardieu
  2007-10-27 21:46   ` Alberto Alonso
  2007-10-27 18:59 ` Richard Scobie
  2007-10-30  4:47 ` Neil Brown
  2 siblings, 2 replies; 28+ messages in thread
From: Doug Ledford @ 2007-10-26 19:00 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

On Fri, 2007-10-26 at 12:12 -0500, Alberto Alonso wrote:
> I've been asking on my other posts but haven't seen
> a direct reply to this question:
> 
> Can MD implement timeouts so that it detects problems when
> drivers don't come back?
> 
> For me this year shall be known as "the year the array
> stood still" (bad scifi reference :-)
> 
> After 4 different array failures all due to a single drive
> failure I think it would really be helpful if the md code
> timed out the driver.

This isn't an md problem, this is a low level disk driver problem.  Yell
at the author of the disk driver in question.  If that driver doesn't
time things out and return errors up the stack in a reasonable time,
then it's broken.  Md should not, and realistically can not, take the
place of a properly written low level driver.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-26 19:00 ` Doug Ledford
@ 2007-10-27 10:33   ` Samuel Tardieu
  2007-10-30  5:19     ` Alberto Alonso
  2007-10-27 21:46   ` Alberto Alonso
  1 sibling, 1 reply; 28+ messages in thread
From: Samuel Tardieu @ 2007-10-27 10:33 UTC (permalink / raw)
  To: linux-raid

>>>>> "Doug" == Doug Ledford <dledford@redhat.com> writes:

Doug> This isn't an md problem, this is a low level disk driver
Doug> problem.  Yell at the author of the disk driver in question.  If
Doug> that driver doesn't time things out and return errors up the
Doug> stack in a reasonable time, then it's broken.  Md should not,
Doug> and realistically can not, take the place of a properly written
Doug> low level driver.

I agree with Doug: nothing prevents you from using md above very slow
drivers (such as remote disks or even a filesystem implemented over a
tape device to make it extreme). Only the low-level drivers know when
it is appropriate to timeout or fail.

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
  2007-10-26 19:00 ` Doug Ledford
@ 2007-10-27 18:59 ` Richard Scobie
       [not found]   ` <1193522726.7690.31.camel@w100>
  2007-10-30  4:47 ` Neil Brown
  2 siblings, 1 reply; 28+ messages in thread
From: Richard Scobie @ 2007-10-27 18:59 UTC (permalink / raw)
  To: linux-raid

Alberto Alonso wrote:

> After 4 different array failures all due to a single drive
> failure I think it would really be helpful if the md code
> timed out the driver.

Hi Alberto,

Sorry you've been having so much trouble.

For interest, can you tell us what drives and controllers are involved?

I've been running md for 8 years and over that time have had probably 
half a dozen drives failed out of arrays without any problems.

Regards,

Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-26 19:00 ` Doug Ledford
  2007-10-27 10:33   ` Samuel Tardieu
@ 2007-10-27 21:46   ` Alberto Alonso
  2007-10-27 23:55     ` Doug Ledford
  1 sibling, 1 reply; 28+ messages in thread
From: Alberto Alonso @ 2007-10-27 21:46 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote:
> 
> This isn't an md problem, this is a low level disk driver problem.  Yell
> at the author of the disk driver in question.  If that driver doesn't
> time things out and return errors up the stack in a reasonable time,
> then it's broken.  Md should not, and realistically can not, take the
> place of a properly written low level driver.
> 

I am not arguing whether or not MD is at fault, I know it isn't. 

Regardless of the fact that it is not MD's fault, it does make
software raid an invalid choice when combined with those drivers. A
single disk failure within a RAID5 array bringing a file server down
is not a valid option under most situations.

I wasn't even asking as to whether or not it should, I was asking if
it could. Should is a relative term, could is not. If the MD code
can not cope with poorly written drivers then a list of valid drivers
and cards would be nice to have (that's why I posted my ... when it
works and when it doesn't, I was trying to come up with such a list).

I only got 1 answer with brand specific information to figure out when
it works and when it doesn't work. My recent experience is that too
many drivers seem to have the problem so software raid is no longer
an option for any new systems that I build, and as time and money
permits I'll be switching to hardware/firmware raid all my legacy
servers.

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
       [not found]   ` <1193522726.7690.31.camel@w100>
@ 2007-10-27 23:46     ` Richard Scobie
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Scobie @ 2007-10-27 23:46 UTC (permalink / raw)
  To: Linux RAID Mailing List

Alberto Alonso wrote:

> What hardware do you use? I was trying to compile a list of known
> configurations capable to detect and degrade properly.

To date I have not yet had a SATA based array drive go faulty - all mine
have been PATA arrays on Intel or AMD MB controllers, which as per your
experience, have failed out drives OK.

I have one 3ware PATA card that is running hardware RAID10 and it has
failed 4 drives over the years without trouble.

Regards,

Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-27 21:46   ` Alberto Alonso
@ 2007-10-27 23:55     ` Doug Ledford
  2007-10-28  6:27       ` Alberto Alonso
  0 siblings, 1 reply; 28+ messages in thread
From: Doug Ledford @ 2007-10-27 23:55 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
> On Fri, 2007-10-26 at 15:00 -0400, Doug Ledford wrote:
> > 
> > This isn't an md problem, this is a low level disk driver problem.  Yell
> > at the author of the disk driver in question.  If that driver doesn't
> > time things out and return errors up the stack in a reasonable time,
> > then it's broken.  Md should not, and realistically can not, take the
> > place of a properly written low level driver.
> > 
> 
> I am not arguing whether or not MD is at fault, I know it isn't. 
> 
> Regardless of the fact that it is not MD's fault, it does make
> software raid an invalid choice when combined with those drivers. A
> single disk failure within a RAID5 array bringing a file server down
> is not a valid option under most situations.

Without knowing the exact controller you have and driver you use, I
certainly can't tell the situation.  However, I will note that there are
times when no matter how well the driver is written, the wrong type of
drive failure *will* take down the entire machine.  For example, on an
SPI SCSI bus, a single drive failure that involves a blown terminator
will cause the electrical signaling on the bus to go dead no matter what
the driver does to try and work around it.

> I wasn't even asking as to whether or not it should, I was asking if
> it could.

It could, but without careful control of timeouts for differing types of
devices, you could end up making the software raid less reliable instead
of more reliable overall.

>  Should is a relative term, could is not. If the MD code
> can not cope with poorly written drivers then a list of valid drivers
> and cards would be nice to have (that's why I posted my ... when it
> works and when it doesn't, I was trying to come up with such a list).

Generally speaking, most modern drivers will work well.  It's easier to
maintain a list of known bad drivers than known good drivers.

> I only got 1 answer with brand specific information to figure out when
> it works and when it doesn't work. My recent experience is that too
> many drivers seem to have the problem so software raid is no longer
> an option for any new systems that I build, and as time and money
> permits I'll be switching to hardware/firmware raid all my legacy
> servers.

Be careful which hardware raid you choose, as in the past several brands
have been known to have the exact same problem you are having with
software raid, so you may not end up buying yourself anything.  (I'm not
naming names because it's been long enough since I paid attention to
hardware raid driver issues that the issues I knew of could have been
solved by now and I don't want to improperly accuse a currently well
working driver of being broken)

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-27 23:55     ` Doug Ledford
@ 2007-10-28  6:27       ` Alberto Alonso
  2007-10-29 17:22         ` Doug Ledford
  0 siblings, 1 reply; 28+ messages in thread
From: Alberto Alonso @ 2007-10-28  6:27 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote:
> On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
> > Regardless of the fact that it is not MD's fault, it does make
> > software raid an invalid choice when combined with those drivers. A
> > single disk failure within a RAID5 array bringing a file server down
> > is not a valid option under most situations.
> 
> Without knowing the exact controller you have and driver you use, I
> certainly can't tell the situation.  However, I will note that there are
> times when no matter how well the driver is written, the wrong type of
> drive failure *will* take down the entire machine.  For example, on an
> SPI SCSI bus, a single drive failure that involves a blown terminator
> will cause the electrical signaling on the bus to go dead no matter what
> the driver does to try and work around it.

Sorry I thought I copied the list with the info that I sent to Richard.
Here is the main hardware combinations.

--- Excerpt Start ----
Certainly. The times when I had good results (ie. failed drives
with properly degraded arrays have been with old PATA based IDE 
controllers built in the motherboard and the Highpoint PATA
cards). The failures (ie. single disk failure bringing the whole
server down) have been with the following:

* External disks on USB enclosures, both RAID1 and RAID5 (two different
  systems) Don't know the actual controller for these. I assume it is
  related to usb-storage, but can probably research the actual chipset,
  if it is needed.

* Internal serverworks PATA controller on a netengine server. The
  server if off waiting to get picked up, so I can't get the important
  details.

* Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
  disks each. (only one drive on one array went bad)

* VIA VT6420 built into the MB with RAID1 across 2 SATA drives.

* And the most complex is this week's server with 4 PCI/PCI-X cards.
  But the one that hanged the server was a 4 disk RAID5 array on a
  RocketRAID1540 card.

--- Excerpt End ----

> 
> > I wasn't even asking as to whether or not it should, I was asking if
> > it could.
> 
> It could, but without careful control of timeouts for differing types of
> devices, you could end up making the software raid less reliable instead
> of more reliable overall.

Even if the default timeout was really long (ie. 1 minute) and then
configurable on a per device (or class) via /proc it would really help.

> Generally speaking, most modern drivers will work well.  It's easier to
> maintain a list of known bad drivers than known good drivers.

That's what has been so frustrating. The old PATA IDE hardware always
worked and the new stuff is what has crashed.

> Be careful which hardware raid you choose, as in the past several brands
> have been known to have the exact same problem you are having with
> software raid, so you may not end up buying yourself anything.  (I'm not
> naming names because it's been long enough since I paid attention to
> hardware raid driver issues that the issues I knew of could have been
> solved by now and I don't want to improperly accuse a currently well
> working driver of being broken)

I have settled for 3ware. All my tests showed that it performed quite
well and kicked drives out when needed. Of course, I haven't had a
bad drive on a 3ware production server yet, so.... I may end up
pulling the little bit of hair I have left.

I am now rushing the RocketRAID 2220 into production without testing
due to it being the only thing I could get my hands on. I'll report
any experiences as they happen.

Thanks for all the info,

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-28  6:27       ` Alberto Alonso
@ 2007-10-29 17:22         ` Doug Ledford
  2007-10-30  5:08           ` Alberto Alonso
  2007-11-07  8:47           ` Goswin von Brederlow
  0 siblings, 2 replies; 28+ messages in thread
From: Doug Ledford @ 2007-10-29 17:22 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5321 bytes --]

On Sun, 2007-10-28 at 01:27 -0500, Alberto Alonso wrote:
> On Sat, 2007-10-27 at 19:55 -0400, Doug Ledford wrote:
> > On Sat, 2007-10-27 at 16:46 -0500, Alberto Alonso wrote:
> > > Regardless of the fact that it is not MD's fault, it does make
> > > software raid an invalid choice when combined with those drivers. A
> > > single disk failure within a RAID5 array bringing a file server down
> > > is not a valid option under most situations.
> > 
> > Without knowing the exact controller you have and driver you use, I
> > certainly can't tell the situation.  However, I will note that there are
> > times when no matter how well the driver is written, the wrong type of
> > drive failure *will* take down the entire machine.  For example, on an
> > SPI SCSI bus, a single drive failure that involves a blown terminator
> > will cause the electrical signaling on the bus to go dead no matter what
> > the driver does to try and work around it.
> 
> Sorry I thought I copied the list with the info that I sent to Richard.
> Here is the main hardware combinations.
> 
> --- Excerpt Start ----
> Certainly. The times when I had good results (ie. failed drives
> with properly degraded arrays have been with old PATA based IDE 
> controllers built in the motherboard and the Highpoint PATA
> cards). The failures (ie. single disk failure bringing the whole
> server down) have been with the following:
> 
> * External disks on USB enclosures, both RAID1 and RAID5 (two different
>   systems) Don't know the actual controller for these. I assume it is
>   related to usb-storage, but can probably research the actual chipset,
>   if it is needed.

OK, these you don't get to count.  If you run raid over USB...well...you
get what you get.  IDE never really was a proper server interface, and
SATA is much better, but USB was never anything other than a means to
connect simple devices without having to put a card in your PC, it was
never intended to be a raid transport.

> * Internal serverworks PATA controller on a netengine server. The
>   server if off waiting to get picked up, so I can't get the important
>   details.

1 PATA failure.

> * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
>   disks each. (only one drive on one array went bad)
> 
> * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
> 
> * And the most complex is this week's server with 4 PCI/PCI-X cards.
>   But the one that hanged the server was a 4 disk RAID5 array on a
>   RocketRAID1540 card.

And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
it has more PATA ports than I've ever seen.

Was the RocketRAID card in hardware or software raid mode?  It sounds
like it could be a combination of both, something like hardware on the
card, and software across the different cards or something like that.

What kernels were these under?

> --- Excerpt End ----
> 
> > 
> > > I wasn't even asking as to whether or not it should, I was asking if
> > > it could.
> > 
> > It could, but without careful control of timeouts for differing types of
> > devices, you could end up making the software raid less reliable instead
> > of more reliable overall.
> 
> Even if the default timeout was really long (ie. 1 minute) and then
> configurable on a per device (or class) via /proc it would really help.

It's a band-aid.  It's working around other bugs in the kernel instead
of fixing the real problem.

> > Generally speaking, most modern drivers will work well.  It's easier to
> > maintain a list of known bad drivers than known good drivers.
> 
> That's what has been so frustrating. The old PATA IDE hardware always
> worked and the new stuff is what has crashed.

In all fairness, the SATA core is still relatively young.  IDE was
around for eons, where as Jeff started the SATA code just a few years
back.  In that time I know he's had to deal with both software bugs and
hardware bugs that would lock a SATA port up solid with no return.  What
it sounds like to me is you found some of those.

> > Be careful which hardware raid you choose, as in the past several brands
> > have been known to have the exact same problem you are having with
> > software raid, so you may not end up buying yourself anything.  (I'm not
> > naming names because it's been long enough since I paid attention to
> > hardware raid driver issues that the issues I knew of could have been
> > solved by now and I don't want to improperly accuse a currently well
> > working driver of being broken)
> 
> I have settled for 3ware. All my tests showed that it performed quite
> well and kicked drives out when needed. Of course, I haven't had a
> bad drive on a 3ware production server yet, so.... I may end up
> pulling the little bit of hair I have left.
> 
> I am now rushing the RocketRAID 2220 into production without testing
> due to it being the only thing I could get my hands on. I'll report
> any experiences as they happen.
> 
> Thanks for all the info,
> 
> Alberto
> 
-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
  2007-10-26 19:00 ` Doug Ledford
  2007-10-27 18:59 ` Richard Scobie
@ 2007-10-30  4:47 ` Neil Brown
  2 siblings, 0 replies; 28+ messages in thread
From: Neil Brown @ 2007-10-30  4:47 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

On Friday October 26, alberto@ggsys.net wrote:
> I've been asking on my other posts but haven't seen
> a direct reply to this question:
> 
> Can MD implement timeouts so that it detects problems when
> drivers don't come back?

No.
However it is possible that we will start sending the BIO_RW_FAILFAST
flag down on some or all requests.  That might make drivers fail more
promptly, which might be  good thing.  However it won't fix bugs in
drivers and - as has been said elsewhere on this thread - that is the
real problem.

NeilBrown

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-29 17:22         ` Doug Ledford
@ 2007-10-30  5:08           ` Alberto Alonso
  2007-10-30 12:12             ` Gabor Gombas
                               ` (2 more replies)
  2007-11-07  8:47           ` Goswin von Brederlow
  1 sibling, 3 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-10-30  5:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:

> OK, these you don't get to count.  If you run raid over USB...well...you
> get what you get.  IDE never really was a proper server interface, and
> SATA is much better, but USB was never anything other than a means to
> connect simple devices without having to put a card in your PC, it was
> never intended to be a raid transport.

I still count them ;-) I guess I just would of hoped for software raid
to really don't care about the lower layers.
> 
> > * Internal serverworks PATA controller on a netengine server. The
> >   server if off waiting to get picked up, so I can't get the important
> >   details.
> 
> 1 PATA failure.

I was surprised on this one, I did have good luck with with PATA in
the past. The kernel is whatever came standard in Fedora Core 2

> 
> > * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
> >   disks each. (only one drive on one array went bad)
> > 
> > * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
> > 
> > * And the most complex is this week's server with 4 PCI/PCI-X cards.
> >   But the one that hanged the server was a 4 disk RAID5 array on a
> >   RocketRAID1540 card.
> 
> And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
> it has more PATA ports than I've ever seen.
> 
> Was the RocketRAID card in hardware or software raid mode?  It sounds
> like it could be a combination of both, something like hardware on the
> card, and software across the different cards or something like that.
> 
> What kernels were these under?


Yes, these 3 were all SATA. The kernels (in the same order as above) 
are:

* 2.4.21-4.ELsmp #1 (Basically RHEL v3)
* 2.6.18-4-686 #1 SMP on a Fedora Core release 2
* 2.6.17.13 (compiled from vanilla sources)

The RocketRAID was configured for all drives as legacy/normal and
software RAID5 across all drives. I wasn't using hardware raid on
the last described system when it crashed.

Alberto



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-27 10:33   ` Samuel Tardieu
@ 2007-10-30  5:19     ` Alberto Alonso
  2007-10-30 17:39       ` Doug Ledford
  0 siblings, 1 reply; 28+ messages in thread
From: Alberto Alonso @ 2007-10-30  5:19 UTC (permalink / raw)
  To: Samuel Tardieu; +Cc: linux-raid

On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote:
> I agree with Doug: nothing prevents you from using md above very slow
> drivers (such as remote disks or even a filesystem implemented over a
> tape device to make it extreme). Only the low-level drivers know when
> it is appropriate to timeout or fail.
> 
>   Sam

The problem is when some of these drivers are just not smart
enough to keep themselves out of trouble. Unfortunately I've
been bitten by apparently too many of them.

I'll repeat my plea one more time. Is there a published list
of tested combinations that respond well to hardware failures
and fully signals the md code so that nothing hangs?

If not, I would like to see what people that have experienced
hardware failures and survived them are using so that such
a list can be compiled.

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-30  5:08           ` Alberto Alonso
@ 2007-10-30 12:12             ` Gabor Gombas
  2007-10-30 17:58             ` Doug Ledford
  2007-11-01 14:19             ` Bill Davidsen
  2 siblings, 0 replies; 28+ messages in thread
From: Gabor Gombas @ 2007-10-30 12:12 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Doug Ledford, linux-raid

On Tue, Oct 30, 2007 at 12:08:07AM -0500, Alberto Alonso wrote:

> > > * Internal serverworks PATA controller on a netengine server. The
> > >   server if off waiting to get picked up, so I can't get the important
> > >   details.
> > 
> > 1 PATA failure.
> 
> I was surprised on this one, I did have good luck with with PATA in
> the past. The kernel is whatever came standard in Fedora Core 2

The keyword here is probably not "PATA" but "Serverworks"... AFAIR that
chipset was always considered somewhat problematic. You may want to try
with the libata driver, it has a nice comment:

 *      Note that we don't copy the old serverworks code because the old
 *      code contains obvious mistakes

But even the new driver retained this coment from the old driver:

 * Documentation:
 *      Available under NDA only. Errata info very hard to get.

It isn't exactly giving me warm feelings to trust data to this chipset...

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-30  5:19     ` Alberto Alonso
@ 2007-10-30 17:39       ` Doug Ledford
  2007-11-01  5:08         ` Alberto Alonso
  0 siblings, 1 reply; 28+ messages in thread
From: Doug Ledford @ 2007-10-30 17:39 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Samuel Tardieu, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]

On Tue, 2007-10-30 at 00:19 -0500, Alberto Alonso wrote:
> On Sat, 2007-10-27 at 12:33 +0200, Samuel Tardieu wrote:
> > I agree with Doug: nothing prevents you from using md above very slow
> > drivers (such as remote disks or even a filesystem implemented over a
> > tape device to make it extreme). Only the low-level drivers know when
> > it is appropriate to timeout or fail.
> > 
> >   Sam
> 
> The problem is when some of these drivers are just not smart
> enough to keep themselves out of trouble. Unfortunately I've
> been bitten by apparently too many of them.

Really, you've only been bitten by three so far.  Serverworks PATA
(which I tend to agree with the other person, I would probably chock
this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
is arranged similar to the SCSI stack with a core library that all the
drivers use, and then hardware dependent driver modules...I suspect that
since you got bit on three different hardware versions that you were in
fact hitting a core library bug, but that's just a suspicion and I could
well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
and generally that's what I've always used and had good things to say
about.  I've only used SATA for my home systems or workstations, not any
production servers.

> I'll repeat my plea one more time. Is there a published list
> of tested combinations that respond well to hardware failures
> and fully signals the md code so that nothing hangs?

I don't know of one, but like I said, I've not used a lot of the SATA
stuff for production.  I would make this one suggestion though, SATA is
still an evolving driver stack to a certain extent, and as such, keeping
with more current kernels than you have been using is likely to be a big
factor in whether or not these sorts of things happen.

> If not, I would like to see what people that have experienced
> hardware failures and survived them are using so that such
> a list can be compiled.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-30  5:08           ` Alberto Alonso
  2007-10-30 12:12             ` Gabor Gombas
@ 2007-10-30 17:58             ` Doug Ledford
  2007-11-01 14:19             ` Bill Davidsen
  2 siblings, 0 replies; 28+ messages in thread
From: Doug Ledford @ 2007-10-30 17:58 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]

On Tue, 2007-10-30 at 00:08 -0500, Alberto Alonso wrote:
> On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:
> 
> > OK, these you don't get to count.  If you run raid over USB...well...you
> > get what you get.  IDE never really was a proper server interface, and
> > SATA is much better, but USB was never anything other than a means to
> > connect simple devices without having to put a card in your PC, it was
> > never intended to be a raid transport.
> 
> I still count them ;-) I guess I just would of hoped for software raid
> to really don't care about the lower layers.

The job of software raid is to help protect your data.  In order to do
that, the raid needs to be run over something that *at least* provides a
minimum level of reliability itself.  The entire USB spec is written
under the assumption that a USB device can disappear at any time and the
stack must accept that (and it can, just trip on a cable some time and
watch your raid device get all pissy).  So, yes, software raid can run
over any block device, but putting it over an unreliable connection
medium is like telling a gladiator that he has to face the lion with no
sword, no shield, and his hands tied behind his back.  He might survive,
but you have so seriously handicapped him that it's all but over.

> > 
> > > * Supermicro MB with ICH5/ICH5R controller and 2 RAID5 arrays of 3 
> > >   disks each. (only one drive on one array went bad)
> > > 
> > > * VIA VT6420 built into the MB with RAID1 across 2 SATA drives.
> > > 
> > > * And the most complex is this week's server with 4 PCI/PCI-X cards.
> > >   But the one that hanged the server was a 4 disk RAID5 array on a
> > >   RocketRAID1540 card.
> > 
> > And 3 SATA failures, right?  I'm assuming the Supermicro is SATA or else
> > it has more PATA ports than I've ever seen.
> > 
> > Was the RocketRAID card in hardware or software raid mode?  It sounds
> > like it could be a combination of both, something like hardware on the
> > card, and software across the different cards or something like that.
> > 
> > What kernels were these under?
> 
> 
> Yes, these 3 were all SATA. The kernels (in the same order as above) 
> are:
> 
> * 2.4.21-4.ELsmp #1 (Basically RHEL v3)

*Really* old kernel.  RHEL3 is in maintenance mode already, and that was
the GA kernel.  It was also the first RHEL release with SATA support.
So, first gen driver on first gen kernel.

> * 2.6.18-4-686 #1 SMP on a Fedora Core release 2
> * 2.6.17.13 (compiled from vanilla sources)
> 
> The RocketRAID was configured for all drives as legacy/normal and
> software RAID5 across all drives. I wasn't using hardware raid on
> the last described system when it crashed.

So, the system that died *just this week* was running 2.6.17.13?  Like I
said in my last email, the SATA stack has been evolving over the last
few years, and that's quite a few revisions behind.  My basic advice is
this: if you are going to use the latest and greatest hardware options,
then you should either make sure you are using an up to date distro
kernel of some sort or you need to watch the kernel update announcements
for fixes related to that hardware and update your kernels/drivers as
appropriate.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-30 17:39       ` Doug Ledford
@ 2007-11-01  5:08         ` Alberto Alonso
  2007-11-01 14:14           ` Bill Davidsen
  2007-11-01 19:16           ` Doug Ledford
  0 siblings, 2 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-11-01  5:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Samuel Tardieu, linux-raid

On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
> 
> Really, you've only been bitten by three so far.  Serverworks PATA
> (which I tend to agree with the other person, I would probably chock

3 types of bugs is too many, it basically affected all my customers
with  multi-terabyte arrays. Heck, we can also oversimplify things and 
say that it is really just one type and define everything as kernel type
problems (or as some other kernel used to say... general protection
error).

I am sorry for not having hundreds of RAID servers from which to draw
statistical analysis. As I have clearly stated in the past I am trying
to come up with a list of known combinations that work. I think my
data points are worth something to some people, specially those 
considering SATA drives and software RAID for their file servers. If
you don't consider them important for you that's fine, but please don't
belittle them just because they don't match your needs.

> this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
> is arranged similar to the SCSI stack with a core library that all the
> drivers use, and then hardware dependent driver modules...I suspect that
> since you got bit on three different hardware versions that you were in
> fact hitting a core library bug, but that's just a suspicion and I could
> well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
> and generally that's what I've always used and had good things to say
> about.  I've only used SATA for my home systems or workstations, not any
> production servers.

The USB array was never meant to be a full production system, just to 
buy some time until the budget was allocated to buy a real array. Having
said that, the raid code is written to withstand the USB disks getting
disconnected as far as the driver reports it properly. Since it doesn't,
I consider it another case that shows when not to use software RAID
thinking that it will work.

As for SCSI I think it is a greatly proved and reliable technology, I've
dealt with it extensively and have always had great results. I know deal
with it mostly on non Linux based systems. But I don't think it is
affordable to most SMBs that need multi-terabyte arrays.

> 
> > I'll repeat my plea one more time. Is there a published list
> > of tested combinations that respond well to hardware failures
> > and fully signals the md code so that nothing hangs?
> 
> I don't know of one, but like I said, I've not used a lot of the SATA
> stuff for production.  I would make this one suggestion though, SATA is
> still an evolving driver stack to a certain extent, and as such, keeping
> with more current kernels than you have been using is likely to be a big
> factor in whether or not these sorts of things happen.

OK, so based on this it seems that you would not recommend the use
of SATA for production systems due to its immaturity, correct? Keep in
mind that production systems are not able to be brought down just to
keep up with kernel changes. We have some tru64 production servers with
1500 to 2500 days uptime, that's not uncommon in industry.

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-01  5:08         ` Alberto Alonso
@ 2007-11-01 14:14           ` Bill Davidsen
  2007-11-01 19:16           ` Doug Ledford
  1 sibling, 0 replies; 28+ messages in thread
From: Bill Davidsen @ 2007-11-01 14:14 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Doug Ledford, Samuel Tardieu, linux-raid

Alberto Alonso wrote:
> On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
>   
>> Really, you've only been bitten by three so far.  Serverworks PATA
>> (which I tend to agree with the other person, I would probably chock
>>     
>
> 3 types of bugs is too many, it basically affected all my customers
> with  multi-terabyte arrays. Heck, we can also oversimplify things and 
> say that it is really just one type and define everything as kernel type
> problems (or as some other kernel used to say... general protection
> error).
>
> I am sorry for not having hundreds of RAID servers from which to draw
> statistical analysis. As I have clearly stated in the past I am trying
> to come up with a list of known combinations that work. I think my
> data points are worth something to some people, specially those 
> considering SATA drives and software RAID for their file servers. If
> you don't consider them important for you that's fine, but please don't
> belittle them just because they don't match your needs.
>
>   
>> this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
>> is arranged similar to the SCSI stack with a core library that all the
>> drivers use, and then hardware dependent driver modules...I suspect that
>> since you got bit on three different hardware versions that you were in
>> fact hitting a core library bug, but that's just a suspicion and I could
>> well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
>> and generally that's what I've always used and had good things to say
>> about.  I've only used SATA for my home systems or workstations, not any
>> production servers.
>>     
>
> The USB array was never meant to be a full production system, just to 
> buy some time until the budget was allocated to buy a real array. Having
> said that, the raid code is written to withstand the USB disks getting
> disconnected as far as the driver reports it properly. Since it doesn't,
> I consider it another case that shows when not to use software RAID
> thinking that it will work.
>
> As for SCSI I think it is a greatly proved and reliable technology, I've
> dealt with it extensively and have always had great results. I know deal
> with it mostly on non Linux based systems. But I don't think it is
> affordable to most SMBs that need multi-terabyte arrays.
>   
Actually, SCSI can fail as well. Until recently I was running servers 
with multi-TB arrays, and regularly, several times a year, a drive would 
fail and glitch the SCSI bus such that the next i/o to another drive 
would fail. And I've had SATA drives fail cleanly on small machines, so 
neither is an "always" config.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-30  5:08           ` Alberto Alonso
  2007-10-30 12:12             ` Gabor Gombas
  2007-10-30 17:58             ` Doug Ledford
@ 2007-11-01 14:19             ` Bill Davidsen
  2 siblings, 0 replies; 28+ messages in thread
From: Bill Davidsen @ 2007-11-01 14:19 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Doug Ledford, linux-raid

Alberto Alonso wrote:
> On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:
>
>   
>> What kernels were these under?
>>     
>
>
> Yes, these 3 were all SATA. The kernels (in the same order as above) 
> are:
>
> * 2.4.21-4.ELsmp #1 (Basically RHEL v3)
> * 2.6.18-4-686 #1 SMP on a Fedora Core release 2
> * 2.6.17.13 (compiled from vanilla sources)
>   

*Old* kernels. If you are going to build your own kernel, get a new one!
> The RocketRAID was configured for all drives as legacy/normal and
> software RAID5 across all drives. I wasn't using hardware raid on
> the last described system when it crashed.
>   
-- 

bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-01  5:08         ` Alberto Alonso
  2007-11-01 14:14           ` Bill Davidsen
@ 2007-11-01 19:16           ` Doug Ledford
  2007-11-02  8:41             ` Alberto Alonso
  1 sibling, 1 reply; 28+ messages in thread
From: Doug Ledford @ 2007-11-01 19:16 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Samuel Tardieu, linux-raid

[-- Attachment #1: Type: text/plain, Size: 6206 bytes --]

On Thu, 2007-11-01 at 00:08 -0500, Alberto Alonso wrote:
> On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
> > 
> > Really, you've only been bitten by three so far.  Serverworks PATA
> > (which I tend to agree with the other person, I would probably chock
> 
> 3 types of bugs is too many, it basically affected all my customers
> with  multi-terabyte arrays. Heck, we can also oversimplify things and 
> say that it is really just one type and define everything as kernel type
> problems (or as some other kernel used to say... general protection
> error).
> 
> I am sorry for not having hundreds of RAID servers from which to draw
> statistical analysis. As I have clearly stated in the past I am trying
> to come up with a list of known combinations that work. I think my
> data points are worth something to some people, specially those 
> considering SATA drives and software RAID for their file servers. If
> you don't consider them important for you that's fine, but please don't
> belittle them just because they don't match your needs.

I wasn't belittling them.  I was trying to isolate the likely culprit in
the situations.  You seem to want the md stack to time things out.  As
has already been commented by several people, myself included, that's a
band-aid and not a fix in the right place.  The linux kernel community
in general is pretty hard lined when it comes to fixing the bug in the
wrong way.

> > this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
> > is arranged similar to the SCSI stack with a core library that all the
> > drivers use, and then hardware dependent driver modules...I suspect that
> > since you got bit on three different hardware versions that you were in
> > fact hitting a core library bug, but that's just a suspicion and I could
> > well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
> > and generally that's what I've always used and had good things to say
> > about.  I've only used SATA for my home systems or workstations, not any
> > production servers.
> 
> The USB array was never meant to be a full production system, just to 
> buy some time until the budget was allocated to buy a real array. Having
> said that, the raid code is written to withstand the USB disks getting
> disconnected as far as the driver reports it properly. Since it doesn't,
> I consider it another case that shows when not to use software RAID
> thinking that it will work.
> 
> As for SCSI I think it is a greatly proved and reliable technology, I've
> dealt with it extensively and have always had great results. I know deal
> with it mostly on non Linux based systems. But I don't think it is
> affordable to most SMBs that need multi-terabyte arrays.
> 
> > 
> > > I'll repeat my plea one more time. Is there a published list
> > > of tested combinations that respond well to hardware failures
> > > and fully signals the md code so that nothing hangs?
> > 
> > I don't know of one, but like I said, I've not used a lot of the SATA
> > stuff for production.  I would make this one suggestion though, SATA is
> > still an evolving driver stack to a certain extent, and as such, keeping
> > with more current kernels than you have been using is likely to be a big
> > factor in whether or not these sorts of things happen.
> 
> OK, so based on this it seems that you would not recommend the use
> of SATA for production systems due to its immaturity, correct?

Not in the older kernel versions you were running, no.

>  Keep in
> mind that production systems are not able to be brought down just to
> keep up with kernel changes. We have some tru64 production servers with
> 1500 to 2500 days uptime, that's not uncommon in industry.

And I guarantee not a single one of those systems even knows what SATA
is.  They all use tried and true SCSI/FC technology.

In any case, if Neil is so inclined to do so, he can add timeout code
into the md stack, it's not my decision to make.

However, I would say that the current RAID subsystem relies on the
underlying disk subsystem to report errors when they occur instead of
hanging infinitely, which implies that the raid subsystem relies upon a
bug free low level driver.  It is intended to deal with hardware
failure, in as much as possible, and a driver bug isn't a hardware
failure.  You are asking the RAID subsystem to be extended to deal with
software errors as well.

Even though you may have thought it should handle this type of failure
when you put those systems together, it in fact was not designed to do
so.  For that reason, choice of hardware and status of drivers for
specific versions of hardware is important, and therefore it is also
important to keep up to date with driver updates.

It's highly likely that had you been keeping up to date with kernels,
several of those failures might not have happened.  One of the benefits
of having many people running a software setup is that when one person
hits a bug and you fix it, and then distribute that fix to everyone
else, you save everyone else from also hitting that bug.  You have
chosen to use relatively new hardware from the OS driver standpoint
(well, not so new now, but it certainly was back when you installed
several of those failed systems), but opted out of keeping up to date
with the kernels that very well may have prevented what happened to you.
There are trade offs in every situation.  If your SMB customers can't
afford years old but well tested and verified hardware to build their
terabyte arrays from, then the reasonable trade off for using more
modern and less tested hardware is that they need to be willing to deal
with occasional maintenance downtime to update kernels or risk what
happened to them.  Just as your tru64 uptimes are fairly industry
standard, so is it pretty industry standard that lower cost/newer
hardware comes with compromises.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-01 19:16           ` Doug Ledford
@ 2007-11-02  8:41             ` Alberto Alonso
  2007-11-02 11:09               ` David Greaves
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-11-02  8:41 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
> I wasn't belittling them.  I was trying to isolate the likely culprit in
> the situations.  You seem to want the md stack to time things out.  As
> has already been commented by several people, myself included, that's a
> band-aid and not a fix in the right place.  The linux kernel community
> in general is pretty hard lined when it comes to fixing the bug in the
> wrong way.

It did sound as if I was complaining about nothing and that I shouldn't
bother the linux-raid people and instead just continuously update the
kernel and stop raising issues. If I misunderstood you I'm sorry, but
somehow I still think that belittling my problems was implied in your
responses.

> Not in the older kernel versions you were running, no.

These "old versions" (specially the RHEL) are supposed to be
the official versions supported by Redhat and the hardware 
vendors, as they were very specific as to what versions of 
Linux were supported. Of all people, I would think you would
appreciate that. Sorry if I sound frustrated and upset, but 
it is clearly a result of what "supported and tested" really 
means in this case. I don't want to go into a discussion of
commercial distros, which are "supported" as this is nor the
time nor the place but I don't want to open the door to the
excuse of "its an old kernel", it wasn't when it got installed.

> And I guarantee not a single one of those systems even knows what SATA
> is.  They all use tried and true SCSI/FC technology.

Sure, the tru64 units I talked about don't use SATA (although 
some did use PATA) I'll concede to that point.

> In any case, if Neil is so inclined to do so, he can add timeout code
> into the md stack, it's not my decision to make.

The timeout was nothing more than a suggestion based on what
I consider a reasonable expectation of usability. Neil said no
and I respect that. If I didn'tm I could always write my own as
per the open source model :-) But I am not inclined to do so.

Outside of the rejected suggestion, I just want to figure out 
when software raid works and when it doesn't. With SATA, my 
experience is that it doesn't. So far I've only received one 
response stating success (they were using the 3ware and Areca 
product lines).

Anyway, this thread just posed the question, and as Neil pointed
out, it isn't feasible/worth to implement timeouts within the md
code. I think most of the points/discussions raised beyond that
original question really belong to the thread "Software RAID when 
it works and when it doesn't" 

I do appreciate all comments and suggestions and I hope to keep
them coming. I would hope however to hear more about success
stories with specific hardware details. It would be helpfull
to have a list of tested configurations that are known to work.

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02  8:41             ` Alberto Alonso
@ 2007-11-02 11:09               ` David Greaves
  2007-11-02 17:47                 ` Alberto Alonso
  2007-11-02 12:44               ` Bill Davidsen
  2007-11-02 15:45               ` Doug Ledford
  2 siblings, 1 reply; 28+ messages in thread
From: David Greaves @ 2007-11-02 11:09 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Doug Ledford, linux-raid

Alberto Alonso wrote:
> On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
>> Not in the older kernel versions you were running, no.
> 
> These "old versions" (specially the RHEL) are supposed to be
> the official versions supported by Redhat and the hardware 
> vendors, as they were very specific as to what versions of 
> Linux were supported. Of all people, I would think you would
> appreciate that. Sorry if I sound frustrated and upset, but 
> it is clearly a result of what "supported and tested" really 
> means in this case. I don't want to go into a discussion of
> commercial distros, which are "supported" as this is nor the
> time nor the place but I don't want to open the door to the
> excuse of "its an old kernel", it wasn't when it got installed.

It may be worth noting that the context of this email is the upstream linux-raid
 list. In my time watching the list it is mainly focused on 'current' code and
development (but hugely supportive of older environments).
In general discussions in this context will have a certain mindset - and it's
not going to be the same as that which you'd find in an enterprise product
support list.

> Outside of the rejected suggestion, I just want to figure out 
> when software raid works and when it doesn't. With SATA, my 
> experience is that it doesn't.

SATA, or more precisely, error handling in SATA has recently been significantly
overhauled by Tejun Heo (IIRC). We're talking post 2.6.18 though (again IIRC) -
so as far as SATA EH goes, older kernels bear no relation to the new ones.

And the initial SATA EH code was, of course, beta :)

David
PS I can't really contribute to your list - I'm only using cheap desktop hardware.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02  8:41             ` Alberto Alonso
  2007-11-02 11:09               ` David Greaves
@ 2007-11-02 12:44               ` Bill Davidsen
  2007-11-02 15:45               ` Doug Ledford
  2 siblings, 0 replies; 28+ messages in thread
From: Bill Davidsen @ 2007-11-02 12:44 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Doug Ledford, linux-raid

Alberto Alonso wrote:
> On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
>   
>> Not in the older kernel versions you were running, no.
>>     
>
> These "old versions" (specially the RHEL) are supposed to be
> the official versions supported by Redhat and the hardware 
> vendors, as they were very specific as to what versions of 
> Linux were supported.

So the vendors of the failing drives claimed that these kernels were 
supported? That's great, most vendors don't even consider Linux 
supported. What response did you get when you reported the problem to 
Redhat on your RHEL support contract? Did they agree that this hardware, 
and its use for software raid, was supported and intended?

>  Of all people, I would think you would
> appreciate that. Sorry if I sound frustrated and upset, but 
> it is clearly a result of what "supported and tested" really 
> means in this case. I don't want to go into a discussion of
> commercial distros, which are "supported" as this is nor the
> time nor the place but I don't want to open the door to the
> excuse of "its an old kernel", it wasn't when it got installed.
>   
The problem is in the time travel module. It didn't properly cope with 
future hardware, and since you have very long uptimes, I'm reasonably 
sure you haven't updated the kernel to get fixes installed.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02  8:41             ` Alberto Alonso
  2007-11-02 11:09               ` David Greaves
  2007-11-02 12:44               ` Bill Davidsen
@ 2007-11-02 15:45               ` Doug Ledford
  2007-11-02 18:21                 ` Alberto Alonso
  2 siblings, 1 reply; 28+ messages in thread
From: Doug Ledford @ 2007-11-02 15:45 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4598 bytes --]

On Fri, 2007-11-02 at 03:41 -0500, Alberto Alonso wrote:
> On Thu, 2007-11-01 at 15:16 -0400, Doug Ledford wrote:
> > Not in the older kernel versions you were running, no.
> 
> These "old versions" (specially the RHEL) are supposed to be
> the official versions supported by Redhat and the hardware 
> vendors, as they were very specific as to what versions of 
> Linux were supported.

The key word here being "supported".  That means if you run across a
problem, we fix it.  It doesn't mean there will never be any problems.

>  Of all people, I would think you would
> appreciate that. Sorry if I sound frustrated and upset, but 
> it is clearly a result of what "supported and tested" really 
> means in this case.

I'm sorry, but given the "specially the RHEL" case you cited, it is
clear I can't help you.  No one can.  You were running first gen
software on first gen hardware.  You show me *any* software company
who's first gen software never has to be updated to fix bugs, and I'll
show you a software company that went out of business they day after
they released their software.

Our RHEL3 update kernels contained *significant* updates to the SATA
stack after our GA release, replete with hardware driver updates and bug
fixes.  I don't know *when* that RHEL3 system failed, but I would
venture a guess that it wasn't prior to RHEL3 Update 1.  So, I'm
guessing you didn't take advantage of those bug fixes.  And I would
hardly call once a quarter "continuously updating" your kernel.  In any
case, given your insistence on running first gen software on first gen
hardware and not taking advantage of the support we *did* provide to
protect you against that failure, I say again that I can't help you.

>  I don't want to go into a discussion of
> commercial distros, which are "supported" as this is nor the
> time nor the place but I don't want to open the door to the
> excuse of "its an old kernel", it wasn't when it got installed.

I *really* can't help you.

> Outside of the rejected suggestion, I just want to figure out 
> when software raid works and when it doesn't. With SATA, my 
> experience is that it doesn't. So far I've only received one 
> response stating success (they were using the 3ware and Areca 
> product lines).

No, your experience, as you listed it, is that
SATA/usb-storage/Serverworks PATA failed you.  The software raid never
failed to perform as designed.

However, one of the things you are doing here is drawing sweeping
generalizations that are totally invalid.  You are saying your
experience is that SATA doesn't work, but you aren't qualifying it with
the key factor: SATA doesn't work in what kernel version?  It is
pointless to try and establish whether or not something like SATA works
in a global, all kernel inclusive fashion because the answer to the
question varies depending on the kernel version.  And the same is true
of pretty much every driver you can name.  This is why commercial
companies don't just certify hardware, but the software version that
actually works as opposed to all versions.  In truth, you have *no idea*
if SATA works today, because you haven't tried.  As David pointed out,
there was a significant overhaul of the SATA error recovery that took
place *after* the kernel versions that failed you which totally
invalidates your experiences and requires retesting of the later
software to see if it performs differently.

> Anyway, this thread just posed the question, and as Neil pointed
> out, it isn't feasible/worth to implement timeouts within the md
> code. I think most of the points/discussions raised beyond that
> original question really belong to the thread "Software RAID when 
> it works and when it doesn't" 
> 
> I do appreciate all comments and suggestions and I hope to keep
> them coming. I would hope however to hear more about success
> stories with specific hardware details. It would be helpfull
> to have a list of tested configurations that are known to work.

I've had *lots* of success with software RAID as I've been running it
for years.  I've had old PATA drives fail, SCSI drives fail, FC drives
fail, and I've had SATA drives that got kicked from the array due to
read errors but not out and out drive failures.  But I keep at least
reasonably up to date with my kernels.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02 11:09               ` David Greaves
@ 2007-11-02 17:47                 ` Alberto Alonso
  0 siblings, 0 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-11-02 17:47 UTC (permalink / raw)
  To: David Greaves; +Cc: Doug Ledford, linux-raid

On Fri, 2007-11-02 at 11:09 +0000, David Greaves wrote:

> David
> PS I can't really contribute to your list - I'm only using cheap desktop hardware.
> -

If you had failures and it properly handled them, then you can 
contribute to the good combinations, so far that's the list
that is kind of empty :-(

Thanks,

Alberto


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02 15:45               ` Doug Ledford
@ 2007-11-02 18:21                 ` Alberto Alonso
  2007-11-02 19:15                   ` Doug Ledford
  0 siblings, 1 reply; 28+ messages in thread
From: Alberto Alonso @ 2007-11-02 18:21 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote:

> The key word here being "supported".  That means if you run across a
> problem, we fix it.  It doesn't mean there will never be any problems.

On hardware specs I normally read "supported" as "tested within that
OS version to work within specs". I may be expecting too much.

> I'm sorry, but given the "specially the RHEL" case you cited, it is
> clear I can't help you.  No one can.  You were running first gen
> software on first gen hardware.  You show me *any* software company
> who's first gen software never has to be updated to fix bugs, and I'll
> show you a software company that went out of business they day after
> they released their software.

I only pointed to RHEL as an example since that was a particular
distro that I use and exhibited the problem. I probably could of
replaced it with Suse, Ubuntu, etc. I may have called the early
versions back in 94 first gen but not today's versions. I know I 
didn't expect the SLS distro to work reliably back then. 

Thanks for reminding me on what I should and shouldn't consider 
first gen. I guess I should always wait for a couple of updates
prior to considering a distro stable, I'll keep that in mind in 
the future.

> I *really* can't help you.

And I never expected you to. None of my posts asked for support
to get my specific hardware and kernels working. I did ask for
help identifying combinations that work and those that don't.

The thread on low level timeouts within MD was meant as a forward
thinking question to see if it could solve some of these problems.
It has been settled that no, so that's that. I am really not trying
to push the issue with MD timeouts.

> No, your experience, as you listed it, is that
> SATA/usb-storage/Serverworks PATA failed you.  The software raid never
> failed to perform as designed.

And I never said that software raid did anything outside what it
was designed to do. I did state that when the goal is to keep the
server from hanging (a reasonable goal if you ask me) the combination
of SATA/usb-storage/Serverworks PATA with software raid is not
a working solution (neither it is without software raid for that
matter)

> However, one of the things you are doing here is drawing sweeping
> generalizations that are totally invalid.  You are saying your
> experience is that SATA doesn't work, but you aren't qualifying it with
> the key factor: SATA doesn't work in what kernel version?  It is
> pointless to try and establish whether or not something like SATA works
> in a global, all kernel inclusive fashion because the answer to the
> question varies depending on the kernel version.  And the same is true
> of pretty much every driver you can name.  This is why commercial

At time of purchase the hardware vendor (Supermicro for those
interested) listed RHLE v3, which is what got installed.

> companies don't just certify hardware, but the software version that
> actually works as opposed to all versions.  In truth, you have *no idea*
> if SATA works today, because you haven't tried.  As David pointed out,
> there was a significant overhaul of the SATA error recovery that took
> place *after* the kernel versions that failed you which totally
> invalidates your experiences and requires retesting of the later
> software to see if it performs differently.

I completely agree that retesting is needed based on the improvements
stated. I don't think it invalidates my experiences though, it does
date them, but that's fine. And yes, I see your point on always listing
specific kernel versions I will do better with the details in the
future.

> I've had *lots* of success with software RAID as I've been running it
> for years.  I've had old PATA drives fail, SCSI drives fail, FC drives
> fail, and I've had SATA drives that got kicked from the array due to
> read errors but not out and out drive failures.  But I keep at least
> reasonably up to date with my kernels.
> 
Can you provide specific chipsets that you used (specially for SATA)? 

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02 18:21                 ` Alberto Alonso
@ 2007-11-02 19:15                   ` Doug Ledford
  2007-11-02 21:24                     ` Alberto Alonso
  0 siblings, 1 reply; 28+ messages in thread
From: Doug Ledford @ 2007-11-02 19:15 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

On Fri, 2007-11-02 at 13:21 -0500, Alberto Alonso wrote:
> On Fri, 2007-11-02 at 11:45 -0400, Doug Ledford wrote:
> 
> > The key word here being "supported".  That means if you run across a
> > problem, we fix it.  It doesn't mean there will never be any problems.
> 
> On hardware specs I normally read "supported" as "tested within that
> OS version to work within specs". I may be expecting too much.

It was tested, it simply obviously had a bug you hit.  Assuming that
your particular failure situation is the only possible outcome for all
the other people that used it would be an invalid assumption.  There are
lots of code paths in an error handler routine, and lots of different
hardware failure scenarios, and they each have their own independent
outcome should they ever be experienced.

> > I'm sorry, but given the "specially the RHEL" case you cited, it is
> > clear I can't help you.  No one can.  You were running first gen
> > software on first gen hardware.  You show me *any* software company
> > who's first gen software never has to be updated to fix bugs, and I'll
> > show you a software company that went out of business they day after
> > they released their software.
> 
> I only pointed to RHEL as an example since that was a particular
> distro that I use and exhibited the problem. I probably could of
> replaced it with Suse, Ubuntu, etc. I may have called the early
> versions back in 94 first gen but not today's versions. I know I 
> didn't expect the SLS distro to work reliably back then. 

Then you didn't pay attention to what I said before: RHEL3 was the first
ever RHEL product that had support for SATA hardware.  The SATA drivers
in RHEL3 *were* first gen.

> Can you provide specific chipsets that you used (specially for SATA)? 

All of the Adaptec SCSI chipsets through the 7899, Intel PATA, QLogic
FC, and nVidia and winbond based SATA.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-11-02 19:15                   ` Doug Ledford
@ 2007-11-02 21:24                     ` Alberto Alonso
  0 siblings, 0 replies; 28+ messages in thread
From: Alberto Alonso @ 2007-11-02 21:24 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-raid

On Fri, 2007-11-02 at 15:15 -0400, Doug Ledford wrote:
> It was tested, it simply obviously had a bug you hit.  Assuming that
> your particular failure situation is the only possible outcome for all
> the other people that used it would be an invalid assumption.  There are
> lots of code paths in an error handler routine, and lots of different
> hardware failure scenarios, and they each have their own independent
> outcome should they ever be experienced.

This is the kind of statement why I said you were belittling my 
experiences. 

And to think that since I've hit it in three different machines with
different hardware and different kernel versions that it won't affect
others is something else. I thought I was helping, but don't worry I
learned my lesson, it won't happen again. I asked people for their
experiences, clearly not everybody is as lucky as I am.

> Then you didn't pay attention to what I said before: RHEL3 was the first
> ever RHEL product that had support for SATA hardware.  The SATA drivers
> in RHEL3 *were* first gen.

Oh, I paid attention alright. It is my fault for assuming that things
not marked as experimental are not experimental.

Alberto

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Implementing low level timeouts within MD
  2007-10-29 17:22         ` Doug Ledford
  2007-10-30  5:08           ` Alberto Alonso
@ 2007-11-07  8:47           ` Goswin von Brederlow
  1 sibling, 0 replies; 28+ messages in thread
From: Goswin von Brederlow @ 2007-11-07  8:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Alberto Alonso, linux-raid

Doug Ledford <dledford@redhat.com> writes:

> On Sun, 2007-10-28 at 01:27 -0500, Alberto Alonso wrote:
>> Even if the default timeout was really long (ie. 1 minute) and then
>> configurable on a per device (or class) via /proc it would really help.
>
> It's a band-aid.  It's working around other bugs in the kernel instead
> of fixing the real problem.

Checks and balances.

Don't trust anyone.

Also some things are broken in hardware. I have external raid boxes
connected via scsi but when I just power-down one of the boxes it
never gives an I/O error or timeout. It seems to just hang. Should I
start fixing the scsi controler firmware to give errors?

A general timeout in the md layer would fix the problems independent
of what controler card I use. A real nifty band-aid. Call it a saftey
blanket.

>> > Generally speaking, most modern drivers will work well.  It's easier to
>> > maintain a list of known bad drivers than known good drivers.
>> 
>> That's what has been so frustrating. The old PATA IDE hardware always
>> worked and the new stuff is what has crashed.
>
> In all fairness, the SATA core is still relatively young.  IDE was
> around for eons, where as Jeff started the SATA code just a few years
> back.  In that time I know he's had to deal with both software bugs and
> hardware bugs that would lock a SATA port up solid with no return.  What
> it sounds like to me is you found some of those.

There will always be bugs. I've seen scsi crash on error often enough
due to totaly broken drivers. You always run into that one odd race
condition the author never thought about when it really counts [MURPHY].

MfG
        Goswin

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-11-07  8:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-26 17:12 Implementing low level timeouts within MD Alberto Alonso
2007-10-26 19:00 ` Doug Ledford
2007-10-27 10:33   ` Samuel Tardieu
2007-10-30  5:19     ` Alberto Alonso
2007-10-30 17:39       ` Doug Ledford
2007-11-01  5:08         ` Alberto Alonso
2007-11-01 14:14           ` Bill Davidsen
2007-11-01 19:16           ` Doug Ledford
2007-11-02  8:41             ` Alberto Alonso
2007-11-02 11:09               ` David Greaves
2007-11-02 17:47                 ` Alberto Alonso
2007-11-02 12:44               ` Bill Davidsen
2007-11-02 15:45               ` Doug Ledford
2007-11-02 18:21                 ` Alberto Alonso
2007-11-02 19:15                   ` Doug Ledford
2007-11-02 21:24                     ` Alberto Alonso
2007-10-27 21:46   ` Alberto Alonso
2007-10-27 23:55     ` Doug Ledford
2007-10-28  6:27       ` Alberto Alonso
2007-10-29 17:22         ` Doug Ledford
2007-10-30  5:08           ` Alberto Alonso
2007-10-30 12:12             ` Gabor Gombas
2007-10-30 17:58             ` Doug Ledford
2007-11-01 14:19             ` Bill Davidsen
2007-11-07  8:47           ` Goswin von Brederlow
2007-10-27 18:59 ` Richard Scobie
     [not found]   ` <1193522726.7690.31.camel@w100>
2007-10-27 23:46     ` Richard Scobie
2007-10-30  4:47 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).