Software RAID when it works and when it doesn't

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Software RAID when it works and when it doesn't
@ 2007-10-13 18:40 Alberto Alonso
  2007-10-13 22:46 ` Eyal Lebedinsky
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-10-13 18:40 UTC (permalink / raw)
  To: linux-raid

Over the past several months I have encountered 3
cases where the software RAID didn't work in keeping
the servers up and running.

In all cases, the failure has been on a single drive,
yet the whole md device and server become unresponsive.

(usb-storage)
In one situation a RAID 0 across 2 USB drives failed
when one of the drives accidentally got turned off.

(sata)
A second case a disk started generating reports like:
end_request: I/O error, dev sdb, sector 42644555

(sata)
The third case (which I'm living right now) is a disk
that I can see during the boot process but that I can't
get operations on it to come back (ie. fdisk -l /dev/sdc). 

(pata)
I have had at least 4 situations on old servers based
on pata disks where disk failures where successful in
being flagged and arrays where degraded automatically.

So, this is all making me wonder under what circumstances
software RAID may have problems detecting disk failures.

I need to come up with a best practices solution and also
need to understand more as I move into raid over local
network (ie. iscsi, AoE or NBD). Could a disk failure in
one of the servers or a server going offline bring the
whole array down?

Thanks for any information or comments,

Alberto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso
@ 2007-10-13 22:46 ` Eyal Lebedinsky
  2007-10-13 22:50 ` Neil Brown
       [not found] ` <471241F8.50205@harddata.com>
  2 siblings, 0 replies; 23+ messages in thread
From: Eyal Lebedinsky @ 2007-10-13 22:46 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

RAID0 is non redundant so a disk failure will correctly fail the array.

Alberto Alonso wrote:
> Over the past several months I have encountered 3
> cases where the software RAID didn't work in keeping
> the servers up and running.
> 
> In all cases, the failure has been on a single drive,
> yet the whole md device and server become unresponsive.
> 
> (usb-storage)
> In one situation a RAID 0 across 2 USB drives failed
> when one of the drives accidentally got turned off.
> 
> (sata)
> A second case a disk started generating reports like:
> end_request: I/O error, dev sdb, sector 42644555
> 
> (sata)
> The third case (which I'm living right now) is a disk
> that I can see during the boot process but that I can't
> get operations on it to come back (ie. fdisk -l /dev/sdc). 
> 
> (pata)
> I have had at least 4 situations on old servers based
> on pata disks where disk failures where successful in
> being flagged and arrays where degraded automatically.
> 
> So, this is all making me wonder under what circumstances
> software RAID may have problems detecting disk failures.
> 
> I need to come up with a best practices solution and also
> need to understand more as I move into raid over local
> network (ie. iscsi, AoE or NBD). Could a disk failure in
> one of the servers or a server going offline bring the
> whole array down?
> 
> Thanks for any information or comments,
> 
> Alberto

-- 
Eyal Lebedinsky	(eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso
  2007-10-13 22:46 ` Eyal Lebedinsky
@ 2007-10-13 22:50 ` Neil Brown
  2007-10-14  5:57   ` Alberto Alonso
       [not found] ` <471241F8.50205@harddata.com>
  2 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2007-10-13 22:50 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-raid

On Saturday October 13, alberto@ggsys.net wrote:
> Over the past several months I have encountered 3
> cases where the software RAID didn't work in keeping
> the servers up and running.
> 
> In all cases, the failure has been on a single drive,
> yet the whole md device and server become unresponsive.
> 
> (usb-storage)
> In one situation a RAID 0 across 2 USB drives failed
> when one of the drives accidentally got turned off.

RAID0 is not true RAID - there is no redundancy.  If one device in a
RAID0 fails, the whole array will fail.  This is expected.

> 
> (sata)
> A second case a disk started generating reports like:
> end_request: I/O error, dev sdb, sector 42644555

So the drive had errors - not uncommon.  What happened to the array?


> 
> (sata)
> The third case (which I'm living right now) is a disk
> that I can see during the boot process but that I can't
> get operations on it to come back (ie. fdisk -l /dev/sdc). 

You mean "fdisk -l /dev/sdc" just hangs?  That sounds like a SATA
driver error.  You should report it to the SATA developers
   linux-ide@vger.kernel.org

md/RAID cannot compensate for problems in the driver code.  It expects
every request that it sends down to either succeed or fail in a
reasonable amount of time.

> 
> (pata)
> I have had at least 4 situations on old servers based
> on pata disks where disk failures where successful in
> being flagged and arrays where degraded automatically.

Good!

> 
> So, this is all making me wonder under what circumstances
> software RAID may have problems detecting disk failures.

RAID1, RAID10, RAID4, RAID5, RAID6 will handle errors that are
correctly reported by the underlying device.

> 
> I need to come up with a best practices solution and also
> need to understand more as I move into raid over local
> network (ie. iscsi, AoE or NBD). Could a disk failure in
> one of the servers or a server going offline bring the
> whole array down?

It shouldn't, providing the low level driver is functioning correctly,
and providing you are using true RAID (not RAID0 or LINEAR).

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-13 22:50 ` Neil Brown
@ 2007-10-14  5:57   ` Alberto Alonso
  2007-10-16 21:57     ` Mike Accetta
  0 siblings, 1 reply; 23+ messages in thread
From: Alberto Alonso @ 2007-10-14  5:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote:
> On Saturday October 13, alberto@ggsys.net wrote:
> > Over the past several months I have encountered 3
> > cases where the software RAID didn't work in keeping
> > the servers up and running.
> > 
> > In all cases, the failure has been on a single drive,
> > yet the whole md device and server become unresponsive.
> > 
> > (usb-storage)
> > In one situation a RAID 0 across 2 USB drives failed
> > when one of the drives accidentally got turned off.
> 
> RAID0 is not true RAID - there is no redundancy.  If one device in a
> RAID0 fails, the whole array will fail.  This is expected.

Sorry, I meant RAID 1. Currently, we only use RAID 1 and RAID 5 on all
our systems.

> 
> > 
> > (sata)
> > A second case a disk started generating reports like:
> > end_request: I/O error, dev sdb, sector 42644555
> 
> So the drive had errors - not uncommon.  What happened to the array?

The array never became degraded, it just made the system
hang. I reported it back in May, but couldn't get it
resolved. I replaced the system and unfortunately went
to a non-RAID solution for that server.

> > 
> > (sata)
> > The third case (which I'm living right now) is a disk
> > that I can see during the boot process but that I can't
> > get operations on it to come back (ie. fdisk -l /dev/sdc). 
> 
> You mean "fdisk -l /dev/sdc" just hangs?  That sounds like a SATA
> driver error.  You should report it to the SATA developers
>    linux-ide@vger.kernel.org
> 
> md/RAID cannot compensate for problems in the driver code.  It expects
> every request that it sends down to either succeed or fail in a
> reasonable amount of time.

Yes, that's exactly what happens. fdisk, dd or any other disk
operation just hanged.

I will report it there, thanks for the pointer.

> 
> > 
> > (pata)
> > I have had at least 4 situations on old servers based
> > on pata disks where disk failures where successful in
> > being flagged and arrays where degraded automatically.
> 
> Good!

Yep, after these results I stopped using hardware RAID. I
went 100% software RAID on all systems other than a few
SCSI hardware RAID systems that we bought as a set. Until this
year that is, when I switched back to hardware RAID for our new
critical systems due to the problems I saw back in May.
> 
> > 
> > So, this is all making me wonder under what circumstances
> > software RAID may have problems detecting disk failures.
> 
> RAID1, RAID10, RAID4, RAID5, RAID6 will handle errors that are
> correctly reported by the underlying device.

Yep, that's what I always thought, I'm just surprised
I had so many problems this year. It makes me wonder the
reliability of the whole thing though.

Even if it is an underlying layer, can the md code implement
its own timeouts?
> 
> > 
> > I need to come up with a best practices solution and also
> > need to understand more as I move into raid over local
> > network (ie. iscsi, AoE or NBD). Could a disk failure in
> > one of the servers or a server going offline bring the
> > whole array down?
> 
> It shouldn't, providing the low level driver is functioning correctly,
> and providing you are using true RAID (not RAID0 or LINEAR).
> NeilBrown
> -

Sorry again for the RAID 0 mistake, I really did mean RAID 1.

I guess that since I had 3 distinct servers crash this year on me 
I am getting paranoid. Is there a test suite or procedure that I can 
do to test for everything that can go wrong?

You mentioned that the md code can not compensate for problems in
the driver code. Couldn't some internal timeout mechanisms help?
I can't no longer use software RAID on SATA for new production 
systems. I've switched to 3ware cards, but they are pricey and we
really don't need them for most of our systems.

I really would like to move to server clusters and RAID on the
network devices for our larger arrays, but I need a way to properly
test every scenario, as those are our critical servers and can not
go down. I would like to figure out a "best practices procedure" that
will ensure the correct degrading of the array upon a single failure,
regardless of the underlying driver (ie. SATA, iSCSI, NBD, etc.) Am
I thinking too much?

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
       [not found] ` <471241F8.50205@harddata.com>
@ 2007-10-14 18:22   ` Alberto Alonso
  0 siblings, 0 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-10-14 18:22 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: vger majordomo for lists

On Sun, 2007-10-14 at 10:21 -0600, Maurice Hilarius wrote:
> Alberto Alonso wrote: 
> >   
> PATA (IDE) with 
> Master and Slave drives is a "bad idea" as, when one drive fails, the
> other of the Master & Slave pair often is no longer usable.
> On discrete interfaces, with all drives configured as Master (single)
>  it is more tolerant.

Before SATA became the de-facto we used promise PCI boards on top
of the built in channels. As you mentioned, a single drive per
channel is a must. We only had small servers with up to 6 PATA
drives.

This was proved to be really reliable and handled all disk failures
without bringing the servers down.

What I am trying to determine in these posts is a combination of
hardware and software that will make software RAID a reliable solution
when disks fail.

> -- 
> With our best regards,
> 
> Maurice W. Hilarius         Telephone: 01-780-456-9771
> Hard Data Ltd.                FAX:          01-780-456-9772
> 11060 - 166 Avenue         email:maurice@harddata.com
> Edmonton, AB, Canada         http://www.harddata.com/
>      T5X 1Y3
> 

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-14  5:57   ` Alberto Alonso
@ 2007-10-16 21:57     ` Mike Accetta
  2007-10-16 22:29       ` Richard Scobie
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mike Accetta @ 2007-10-16 21:57 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Neil Brown, linux-raid

Alberto Alonso writes:

> On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote:
> > On Saturday October 13, alberto@ggsys.net wrote:
> > > Over the past several months I have encountered 3
> > > cases where the software RAID didn't work in keeping
> > > the servers up and running.
> > > 
> > > In all cases, the failure has been on a single drive,
> > > yet the whole md device and server become unresponsive.
> > > 
> > > (usb-storage)
> > > In one situation a RAID 0 across 2 USB drives failed
> > > when one of the drives accidentally got turned off.
> > 
> > RAID0 is not true RAID - there is no redundancy.  If one device in a
> > RAID0 fails, the whole array will fail.  This is expected.
> 
> Sorry, I meant RAID 1. Currently, we only use RAID 1 and RAID 5 on all
> our systems.
> 
> > 
> > > 
> > > (sata)
> > > A second case a disk started generating reports like:
> > > end_request: I/O error, dev sdb, sector 42644555
> > 
> > So the drive had errors - not uncommon.  What happened to the array?
> 
> The array never became degraded, it just made the system
> hang. I reported it back in May, but couldn't get it
> resolved. I replaced the system and unfortunately went
> to a non-RAID solution for that server.

Was the disk driver generating any low level errors or otherwise
indicating that it might be retrying operations on the bad drive at
the time (i.e. console diagnostics)?  As Neil mentioned later, the md layer
is at the mercy of the low level disk driver.  We've observed abysmal
RAID1 recovery times on failing SATA disks because all the time is
being spent in the driver retrying operations which will never succeed.
Also, read errors don't tend to fail the array so when the bad disk is
again accessed for some subsequent read the whole hopeless retry process
begins anew.

I posted a patch about 6 weeks ago which attempts to improve this situation
for RAID1 by telling the driver not to retry on failures and giving some
weight to read errors for failing the array.  Hopefully, Neil is still
mulling it over and it or something similar will eventually make it into
the main line kernel as a solution for this problem.
--
Mike Accetta

ECI Telecom Ltd.
Transport Networking Division, US (previously Laurel Networks)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-16 21:57     ` Mike Accetta
@ 2007-10-16 22:29       ` Richard Scobie
  2007-10-17 21:53       ` Support
  2007-10-18 15:26       ` Goswin von Brederlow
  2 siblings, 0 replies; 23+ messages in thread
From: Richard Scobie @ 2007-10-16 22:29 UTC (permalink / raw)
  To: linux-raid

Mike Accetta wrote:

> is at the mercy of the low level disk driver.  We've observed abysmal
> RAID1 recovery times on failing SATA disks because all the time is
> being spent in the driver retrying operations which will never succeed.
> Also, read errors don't tend to fail the array so when the bad disk is
> again accessed for some subsequent read the whole hopeless retry process
> begins anew.

This is one issue I believe the Western Digital RE/RE2 drives address. 
The "TLER (time-limited error recovery)" feature limits retrys to try to 
prevent this. I have the figure of 5 seconds in my head, but could be wrong.

Not sure if the Seagate nearline range offers the same sort of thing.

Regards,

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-16 21:57     ` Mike Accetta
  2007-10-16 22:29       ` Richard Scobie
@ 2007-10-17 21:53       ` Support
  2007-10-18 15:26       ` Goswin von Brederlow
  2 siblings, 0 replies; 23+ messages in thread
From: Support @ 2007-10-17 21:53 UTC (permalink / raw)
  To: Mike Accetta; +Cc: Neil Brown, linux-raid

On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote:

> Was the disk driver generating any low level errors or otherwise
> indicating that it might be retrying operations on the bad drive at
> the time (i.e. console diagnostics)?  As Neil mentioned later, the md layer
> is at the mercy of the low level disk driver.  We've observed abysmal
> RAID1 recovery times on failing SATA disks because all the time is
> being spent in the driver retrying operations which will never succeed.
> Also, read errors don't tend to fail the array so when the bad disk is
> again accessed for some subsequent read the whole hopeless retry process
> begins anew.

The console was full of errors like:

end_request: I/O error, dev sdb, sector 42644555

I don't know what generates those messages.

As I asked before but never got an answer, is there a way to do timeouts
within the md code so that we are not at the mercy of the lower layer
drivers?

> 
> I posted a patch about 6 weeks ago which attempts to improve this situation
> for RAID1 by telling the driver not to retry on failures and giving some
> weight to read errors for failing the array.  Hopefully, Neil is still
> mulling it over and it or something similar will eventually make it into
> the main line kernel as a solution for this problem.
> --
> Mike Accetta
> 

Thanks,

Alberto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-16 21:57     ` Mike Accetta
  2007-10-16 22:29       ` Richard Scobie
  2007-10-17 21:53       ` Support
@ 2007-10-18 15:26       ` Goswin von Brederlow
  2007-10-19  7:07         ` Alberto Alonso
  2 siblings, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2007-10-18 15:26 UTC (permalink / raw)
  To: Mike Accetta; +Cc: Alberto Alonso, Neil Brown, linux-raid

Mike Accetta <maccetta@laurelnetworks.com> writes:

> Also, read errors don't tend to fail the array so when the bad disk is
> again accessed for some subsequent read the whole hopeless retry process
> begins anew.
>
> I posted a patch about 6 weeks ago which attempts to improve this situation
> for RAID1 by telling the driver not to retry on failures and giving some
> weight to read errors for failing the array.  Hopefully, Neil is still
> mulling it over and it or something similar will eventually make it into
> the main line kernel as a solution for this problem.

What I would like to see is a timeout driven fallback mechanism. If
one mirror does not return the requested data within a certain time
(say 1 second) then the request should be duplicated on the other
mirror. If the first mirror later unchokes then it remains in the
raid, if it fails it gets removed. But (at least reads) should not
have to wait for that process.

Even better would be if some write delay could also be used. The still
working mirror would get an increase in its serial (so on reboot you
know one disk is newer). If the choking mirror unchokes then it can
write back all the delayed data and also increase its serial to
match. Otherwise it gets really failed. But you might have to use
bitmaps for this or the cache size would limit its usefullnes.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-18 15:26       ` Goswin von Brederlow
@ 2007-10-19  7:07         ` Alberto Alonso
  2007-10-19 15:02           ` Justin Piszcz
  2007-10-23 22:45           ` Bill Davidsen
  0 siblings, 2 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-10-19  7:07 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Mike Accetta, Neil Brown, linux-raid

On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
> Mike Accetta <maccetta@laurelnetworks.com> writes:

> What I would like to see is a timeout driven fallback mechanism. If
> one mirror does not return the requested data within a certain time
> (say 1 second) then the request should be duplicated on the other
> mirror. If the first mirror later unchokes then it remains in the
> raid, if it fails it gets removed. But (at least reads) should not
> have to wait for that process.
> 
> Even better would be if some write delay could also be used. The still
> working mirror would get an increase in its serial (so on reboot you
> know one disk is newer). If the choking mirror unchokes then it can
> write back all the delayed data and also increase its serial to
> match. Otherwise it gets really failed. But you might have to use
> bitmaps for this or the cache size would limit its usefullnes.
> 
> MfG
>         Goswin

I think a timeout on both: reads and writes is a must. Basically I
believe that all problems that I've encountered issues using software
raid would have been resolved by using a timeout within the md code.

This will keep a server from crashing/hanging when the underlying 
driver doesn't properly handle hard drive problems. MD can be 
smarter than the "dumb" drivers.

Just my thoughts though, as I've never got an answer as to whether or
not md can implement its own timeouts.

Alberto



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-19  7:07         ` Alberto Alonso
@ 2007-10-19 15:02           ` Justin Piszcz
  2007-10-20 13:45             ` Michael Tokarev
  2007-10-26 16:11             ` Goswin von Brederlow
  2007-10-23 22:45           ` Bill Davidsen
  1 sibling, 2 replies; 23+ messages in thread
From: Justin Piszcz @ 2007-10-19 15:02 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid



On Fri, 19 Oct 2007, Alberto Alonso wrote:

> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
>> Mike Accetta <maccetta@laurelnetworks.com> writes:
>
>> What I would like to see is a timeout driven fallback mechanism. If
>> one mirror does not return the requested data within a certain time
>> (say 1 second) then the request should be duplicated on the other
>> mirror. If the first mirror later unchokes then it remains in the
>> raid, if it fails it gets removed. But (at least reads) should not
>> have to wait for that process.
>>
>> Even better would be if some write delay could also be used. The still
>> working mirror would get an increase in its serial (so on reboot you
>> know one disk is newer). If the choking mirror unchokes then it can
>> write back all the delayed data and also increase its serial to
>> match. Otherwise it gets really failed. But you might have to use
>> bitmaps for this or the cache size would limit its usefullnes.
>>
>> MfG
>>         Goswin
>
> I think a timeout on both: reads and writes is a must. Basically I
> believe that all problems that I've encountered issues using software
> raid would have been resolved by using a timeout within the md code.
>
> This will keep a server from crashing/hanging when the underlying
> driver doesn't properly handle hard drive problems. MD can be
> smarter than the "dumb" drivers.
>
> Just my thoughts though, as I've never got an answer as to whether or
> not md can implement its own timeouts.
>
> Alberto
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I have a question with re-mapping sectors, can software raid be as 
efficient or good at remapping bad sectors as an external raid controller 
for, e.g., raid 10 or raid5?

Justin.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-19 15:02           ` Justin Piszcz
@ 2007-10-20 13:45             ` Michael Tokarev
  2007-10-20 13:55               ` Justin Piszcz
  2007-10-26 16:11             ` Goswin von Brederlow
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Tokarev @ 2007-10-20 13:45 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown,
	linux-raid

Justin Piszcz wrote:
[]
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Justin, forgive me please, but can you learn to trim the original
messages when replying, at least cut off the very irrelevant parts?
You're always quoting the whole message, even including the part
after a line consiting of single minus sign "-" - a part that most
MUAs will remove when replying...

> I have a question with re-mapping sectors, can software raid be as
> efficient or good at remapping bad sectors as an external raid
> controller for, e.g., raid 10 or raid5?

Hard disks ARE remapping bad sectors by their own.  In most cases
that's sufficient - there's nothing to do for raid (be it hardware
raid or software) except of perform a write to the bad place, just
to trigger an in-disk remapping procedure.  Even the cheapest drives
nowadays has some remapping capability.

There was an idea some years ago about having an additional layer on
between a block device and whatever else is above it (filesystem or
something else), that will just do bad block remapping.  Maybe it was
even implemented in LVM or IBM-proposed EVMS (the version that included
in-kernel stuff too, not only the userspace management), but I don't
remember details anymore.  In any case, - but again, if memory serves
me right, -- there was low interest in that because of exactly this --
drives are now more intelligent, there's hardly a notion of "bad block"
anymore, at least persistent bad block, -- at least visible to the
upper layers.

/mjt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-20 13:45             ` Michael Tokarev
@ 2007-10-20 13:55               ` Justin Piszcz
  0 siblings, 0 replies; 23+ messages in thread
From: Justin Piszcz @ 2007-10-20 13:55 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown,
	linux-raid



On Sat, 20 Oct 2007, Michael Tokarev wrote:

> There was an idea some years ago about having an additional layer on
> between a block device and whatever else is above it (filesystem or
> something else), that will just do bad block remapping.  Maybe it was
> even implemented in LVM or IBM-proposed EVMS (the version that included
> in-kernel stuff too, not only the userspace management), but I don't
> remember details anymore.  In any case, - but again, if memory serves
> me right, -- there was low interest in that because of exactly this --
> drives are now more intelligent, there's hardly a notion of "bad block"
> anymore, at least persistent bad block, -- at least visible to the
> upper layers.
>
> /mjt
>

When I run 3dm2 (3ware 3dm2/tools/daemon) I often see LBA remapped sector, 
success, etc..

My question is, how come I do not see this with mdadm/software raid?

Justin.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-19  7:07         ` Alberto Alonso
  2007-10-19 15:02           ` Justin Piszcz
@ 2007-10-23 22:45           ` Bill Davidsen
  2007-10-24  5:50             ` Alberto Alonso
  1 sibling, 1 reply; 23+ messages in thread
From: Bill Davidsen @ 2007-10-23 22:45 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

Alberto Alonso wrote:
> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
>   
>> Mike Accetta <maccetta@laurelnetworks.com> writes:
>>     
>
>   
>> What I would like to see is a timeout driven fallback mechanism. If
>> one mirror does not return the requested data within a certain time
>> (say 1 second) then the request should be duplicated on the other
>> mirror. If the first mirror later unchokes then it remains in the
>> raid, if it fails it gets removed. But (at least reads) should not
>> have to wait for that process.
>>
>> Even better would be if some write delay could also be used. The still
>> working mirror would get an increase in its serial (so on reboot you
>> know one disk is newer). If the choking mirror unchokes then it can
>> write back all the delayed data and also increase its serial to
>> match. Otherwise it gets really failed. But you might have to use
>> bitmaps for this or the cache size would limit its usefullnes.
>>
>> MfG
>>         Goswin
>>     
>
> I think a timeout on both: reads and writes is a must. Basically I
> believe that all problems that I've encountered issues using software
> raid would have been resolved by using a timeout within the md code.
>
> This will keep a server from crashing/hanging when the underlying 
> driver doesn't properly handle hard drive problems. MD can be 
> smarter than the "dumb" drivers.
>
> Just my thoughts though, as I've never got an answer as to whether or
> not md can implement its own timeouts.

I'm not sure the timeouts are the problem, even if md did its own 
timeout, it then needs a way to tell the driver (or device) to stop 
retrying. I don't believe that's available, certainly not everywhere, 
and anything other than everywhere would turn the md code into a nest of 
exceptions.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-23 22:45           ` Bill Davidsen
@ 2007-10-24  5:50             ` Alberto Alonso
  2007-10-24 20:04               ` Bill Davidsen
  0 siblings, 1 reply; 23+ messages in thread
From: Alberto Alonso @ 2007-10-24  5:50 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote:

> I'm not sure the timeouts are the problem, even if md did its own 
> timeout, it then needs a way to tell the driver (or device) to stop 
> retrying. I don't believe that's available, certainly not everywhere, 
> and anything other than everywhere would turn the md code into a nest of 
> exceptions.
> 

If we loose the ability to communication to that drive I don't see it
as a problem (that's the whole point, we kick it out of the array). So,
if we can't tell the driver about the failure we are still OK, md could
successfully deal with misbehaved drivers.

Alberto



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-24  5:50             ` Alberto Alonso
@ 2007-10-24 20:04               ` Bill Davidsen
  2007-10-24 20:18                 ` Alberto Alonso
  2007-10-26 16:12                 ` Goswin von Brederlow
  0 siblings, 2 replies; 23+ messages in thread
From: Bill Davidsen @ 2007-10-24 20:04 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

Alberto Alonso wrote:
> On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote:
>
>   
>> I'm not sure the timeouts are the problem, even if md did its own 
>> timeout, it then needs a way to tell the driver (or device) to stop 
>> retrying. I don't believe that's available, certainly not everywhere, 
>> and anything other than everywhere would turn the md code into a nest of 
>> exceptions.
>>
>>     
>
> If we loose the ability to communication to that drive I don't see it
> as a problem (that's the whole point, we kick it out of the array). So,
> if we can't tell the driver about the failure we are still OK, md could
> successfully deal with misbehaved drivers.

I think what you really want is to notice how long the drive and driver 
took to recover or fail, and take action based on that. In general "kick 
the drive" is not optimal for a few bad spots, even if the drive 
recovery sucks.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-24 20:04               ` Bill Davidsen
@ 2007-10-24 20:18                 ` Alberto Alonso
  2007-10-26 16:12                 ` Goswin von Brederlow
  1 sibling, 0 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-10-24 20:18 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote:

> I think what you really want is to notice how long the drive and driver 
> took to recover or fail, and take action based on that. In general "kick 
> the drive" is not optimal for a few bad spots, even if the drive 
> recovery sucks.

The problem is that the driver never comes back and the whole
array hangs, waiting forever. That's why a timeout within the
md code is needed to recover from these type of drivers.

Alberto


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-19 15:02           ` Justin Piszcz
  2007-10-20 13:45             ` Michael Tokarev
@ 2007-10-26 16:11             ` Goswin von Brederlow
  2007-10-26 16:11               ` Justin Piszcz
  1 sibling, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2007-10-26 16:11 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown,
	linux-raid

Justin Piszcz <jpiszcz@lucidpixels.com> writes:

> On Fri, 19 Oct 2007, Alberto Alonso wrote:
>
>> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
>>> Mike Accetta <maccetta@laurelnetworks.com> writes:
>>
>>> What I would like to see is a timeout driven fallback mechanism. If
>>> one mirror does not return the requested data within a certain time
>>> (say 1 second) then the request should be duplicated on the other
>>> mirror. If the first mirror later unchokes then it remains in the
>>> raid, if it fails it gets removed. But (at least reads) should not
>>> have to wait for that process.
>>>
>>> Even better would be if some write delay could also be used. The still
>>> working mirror would get an increase in its serial (so on reboot you
>>> know one disk is newer). If the choking mirror unchokes then it can
>>> write back all the delayed data and also increase its serial to
>>> match. Otherwise it gets really failed. But you might have to use
>>> bitmaps for this or the cache size would limit its usefullnes.
>>>
>>> MfG
>>>         Goswin
>>
>> I think a timeout on both: reads and writes is a must. Basically I
>> believe that all problems that I've encountered issues using software
>> raid would have been resolved by using a timeout within the md code.
>>
>> This will keep a server from crashing/hanging when the underlying
>> driver doesn't properly handle hard drive problems. MD can be
>> smarter than the "dumb" drivers.
>>
>> Just my thoughts though, as I've never got an answer as to whether or
>> not md can implement its own timeouts.
>>
>> Alberto
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> I have a question with re-mapping sectors, can software raid be as
> efficient or good at remapping bad sectors as an external raid
> controller for, e.g., raid 10 or raid5?
>
> Justin.

Software raid makes no remapping of bad sectors at all. It assumes the
disks will do sufficient remapping.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-26 16:11             ` Goswin von Brederlow
@ 2007-10-26 16:11               ` Justin Piszcz
  0 siblings, 0 replies; 23+ messages in thread
From: Justin Piszcz @ 2007-10-26 16:11 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Alberto Alonso, Mike Accetta, Neil Brown, linux-raid



On Fri, 26 Oct 2007, Goswin von Brederlow wrote:

> Justin Piszcz <jpiszcz@lucidpixels.com> writes:
>
>> On Fri, 19 Oct 2007, Alberto Alonso wrote:
>>
>>> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
>>>> Mike Accetta <maccetta@laurelnetworks.com> writes:
>>>
>>>> What I would like to see is a timeout driven fallback mechanism. If
>>>> one mirror does not return the requested data within a certain time
>>>> (say 1 second) then the request should be duplicated on the other
>>>> mirror. If the first mirror later unchokes then it remains in the
>>>> raid, if it fails it gets removed. But (at least reads) should not
>>>> have to wait for that process.
>>>>
>>>> Even better would be if some write delay could also be used. The still
>>>> working mirror would get an increase in its serial (so on reboot you
>>>> know one disk is newer). If the choking mirror unchokes then it can
>>>> write back all the delayed data and also increase its serial to
>>>> match. Otherwise it gets really failed. But you might have to use
>>>> bitmaps for this or the cache size would limit its usefullnes.
>>>>
>>>> MfG
>>>>         Goswin
>>>
>>> I think a timeout on both: reads and writes is a must. Basically I
>>> believe that all problems that I've encountered issues using software
>>> raid would have been resolved by using a timeout within the md code.
>>>
>>> This will keep a server from crashing/hanging when the underlying
>>> driver doesn't properly handle hard drive problems. MD can be
>>> smarter than the "dumb" drivers.
>>>
>>> Just my thoughts though, as I've never got an answer as to whether or
>>> not md can implement its own timeouts.
>>>
>>> Alberto
>>>
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> I have a question with re-mapping sectors, can software raid be as
>> efficient or good at remapping bad sectors as an external raid
>> controller for, e.g., raid 10 or raid5?
>>
>> Justin.
>
> Software raid makes no remapping of bad sectors at all. It assumes the
> disks will do sufficient remapping.
>
> MfG
>        Goswin
>

Thanks, this is what I was looking for.

Justin.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-24 20:04               ` Bill Davidsen
  2007-10-24 20:18                 ` Alberto Alonso
@ 2007-10-26 16:12                 ` Goswin von Brederlow
  2007-10-26 17:09                   ` Alberto Alonso
  1 sibling, 1 reply; 23+ messages in thread
From: Goswin von Brederlow @ 2007-10-26 16:12 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown,
	linux-raid

Bill Davidsen <davidsen@tmr.com> writes:

> Alberto Alonso wrote:
>> On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote:
>>
>>
>>> I'm not sure the timeouts are the problem, even if md did its own
>>> timeout, it then needs a way to tell the driver (or device) to stop
>>> retrying. I don't believe that's available, certainly not
>>> everywhere, and anything other than everywhere would turn the md
>>> code into a nest of exceptions.
>>>
>>>
>>
>> If we loose the ability to communication to that drive I don't see it
>> as a problem (that's the whole point, we kick it out of the array). So,
>> if we can't tell the driver about the failure we are still OK, md could
>> successfully deal with misbehaved drivers.
>
> I think what you really want is to notice how long the drive and
> driver took to recover or fail, and take action based on that. In
> general "kick the drive" is not optimal for a few bad spots, even if
> the drive recovery sucks.

Depending on the hardware you can still access a different disk while
another one is reseting. But since there is no timeout in md it won't
try to use any other disk while one is stuck.

That is exactly what I miss.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-26 16:12                 ` Goswin von Brederlow
@ 2007-10-26 17:09                   ` Alberto Alonso
  2007-10-27 15:26                     ` Bill Davidsen
  0 siblings, 1 reply; 23+ messages in thread
From: Alberto Alonso @ 2007-10-26 17:09 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Bill Davidsen, Mike Accetta, Neil Brown, linux-raid

On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote:

> Depending on the hardware you can still access a different disk while
> another one is reseting. But since there is no timeout in md it won't
> try to use any other disk while one is stuck.
> 
> That is exactly what I miss.
> 
> MfG
>         Goswin
> -

That is exactly what I've been talking about. Can md implement
timeouts and not just leave it to the drivers?

I can't believe it but last night another array hit the dust when
1 of the 12 drives went bad. This year is just a nightmare for
me. It brought all the network down until I was able to mark it
failed and reboot to remove it from the array.

Alberto

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-26 17:09                   ` Alberto Alonso
@ 2007-10-27 15:26                     ` Bill Davidsen
  2007-11-02  8:47                       ` Alberto Alonso
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Davidsen @ 2007-10-27 15:26 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

Alberto Alonso wrote:
> On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote:
>
>   
>> Depending on the hardware you can still access a different disk while
>> another one is reseting. But since there is no timeout in md it won't
>> try to use any other disk while one is stuck.
>>
>> That is exactly what I miss.
>>
>> MfG
>>         Goswin
>> -
>>     
>
> That is exactly what I've been talking about. Can md implement
> timeouts and not just leave it to the drivers?
>
> I can't believe it but last night another array hit the dust when
> 1 of the 12 drives went bad. This year is just a nightmare for
> me. It brought all the network down until I was able to mark it
> failed and reboot to remove it from the array.
>   

I'm not sure what kind of drives and drivers you use, but I certainly 
have drives go bad and they get marked as failed. Both on old PATA 
drives and newer SATA. All the SCSI I currently use is on IBM hardware 
RAID (ServeRAID), so I can only assume that failure would be noted.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Software RAID when it works and when it doesn't
  2007-10-27 15:26                     ` Bill Davidsen
@ 2007-11-02  8:47                       ` Alberto Alonso
  0 siblings, 0 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-11-02  8:47 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid

On Sat, 2007-10-27 at 11:26 -0400, Bill Davidsen wrote:
> Alberto Alonso wrote:
> > On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote:
> >
> >   
> >> Depending on the hardware you can still access a different disk while
> >> another one is reseting. But since there is no timeout in md it won't
> >> try to use any other disk while one is stuck.
> >>
> >> That is exactly what I miss.
> >>
> >> MfG
> >>         Goswin
> >> -
> >>     
> >
> > That is exactly what I've been talking about. Can md implement
> > timeouts and not just leave it to the drivers?
> >
> > I can't believe it but last night another array hit the dust when
> > 1 of the 12 drives went bad. This year is just a nightmare for
> > me. It brought all the network down until I was able to mark it
> > failed and reboot to remove it from the array.
> >   
> 
> I'm not sure what kind of drives and drivers you use, but I certainly 
> have drives go bad and they get marked as failed. Both on old PATA 
> drives and newer SATA. All the SCSI I currently use is on IBM hardware 
> RAID (ServeRAID), so I can only assume that failure would be noted.
> 
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-11-02  8:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso
2007-10-13 22:46 ` Eyal Lebedinsky
2007-10-13 22:50 ` Neil Brown
2007-10-14  5:57   ` Alberto Alonso
2007-10-16 21:57     ` Mike Accetta
2007-10-16 22:29       ` Richard Scobie
2007-10-17 21:53       ` Support
2007-10-18 15:26       ` Goswin von Brederlow
2007-10-19  7:07         ` Alberto Alonso
2007-10-19 15:02           ` Justin Piszcz
2007-10-20 13:45             ` Michael Tokarev
2007-10-20 13:55               ` Justin Piszcz
2007-10-26 16:11             ` Goswin von Brederlow
2007-10-26 16:11               ` Justin Piszcz
2007-10-23 22:45           ` Bill Davidsen
2007-10-24  5:50             ` Alberto Alonso
2007-10-24 20:04               ` Bill Davidsen
2007-10-24 20:18                 ` Alberto Alonso
2007-10-26 16:12                 ` Goswin von Brederlow
2007-10-26 17:09                   ` Alberto Alonso
2007-10-27 15:26                     ` Bill Davidsen
2007-11-02  8:47                       ` Alberto Alonso
     [not found] ` <471241F8.50205@harddata.com>
2007-10-14 18:22   ` Alberto Alonso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).