* Software RAID when it works and when it doesn't
@ 2007-10-13 18:40 Alberto Alonso
2007-10-13 22:46 ` Eyal Lebedinsky
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Alberto Alonso @ 2007-10-13 18:40 UTC (permalink / raw)
To: linux-raid
Over the past several months I have encountered 3
cases where the software RAID didn't work in keeping
the servers up and running.
In all cases, the failure has been on a single drive,
yet the whole md device and server become unresponsive.
(usb-storage)
In one situation a RAID 0 across 2 USB drives failed
when one of the drives accidentally got turned off.
(sata)
A second case a disk started generating reports like:
end_request: I/O error, dev sdb, sector 42644555
(sata)
The third case (which I'm living right now) is a disk
that I can see during the boot process but that I can't
get operations on it to come back (ie. fdisk -l /dev/sdc).
(pata)
I have had at least 4 situations on old servers based
on pata disks where disk failures where successful in
being flagged and arrays where degraded automatically.
So, this is all making me wonder under what circumstances
software RAID may have problems detecting disk failures.
I need to come up with a best practices solution and also
need to understand more as I move into raid over local
network (ie. iscsi, AoE or NBD). Could a disk failure in
one of the servers or a server going offline bring the
whole array down?
Thanks for any information or comments,
Alberto
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Software RAID when it works and when it doesn't 2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso @ 2007-10-13 22:46 ` Eyal Lebedinsky 2007-10-13 22:50 ` Neil Brown [not found] ` <471241F8.50205@harddata.com> 2 siblings, 0 replies; 23+ messages in thread From: Eyal Lebedinsky @ 2007-10-13 22:46 UTC (permalink / raw) To: Alberto Alonso; +Cc: linux-raid RAID0 is non redundant so a disk failure will correctly fail the array. Alberto Alonso wrote: > Over the past several months I have encountered 3 > cases where the software RAID didn't work in keeping > the servers up and running. > > In all cases, the failure has been on a single drive, > yet the whole md device and server become unresponsive. > > (usb-storage) > In one situation a RAID 0 across 2 USB drives failed > when one of the drives accidentally got turned off. > > (sata) > A second case a disk started generating reports like: > end_request: I/O error, dev sdb, sector 42644555 > > (sata) > The third case (which I'm living right now) is a disk > that I can see during the boot process but that I can't > get operations on it to come back (ie. fdisk -l /dev/sdc). > > (pata) > I have had at least 4 situations on old servers based > on pata disks where disk failures where successful in > being flagged and arrays where degraded automatically. > > So, this is all making me wonder under what circumstances > software RAID may have problems detecting disk failures. > > I need to come up with a best practices solution and also > need to understand more as I move into raid over local > network (ie. iscsi, AoE or NBD). Could a disk failure in > one of the servers or a server going offline bring the > whole array down? > > Thanks for any information or comments, > > Alberto -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso 2007-10-13 22:46 ` Eyal Lebedinsky @ 2007-10-13 22:50 ` Neil Brown 2007-10-14 5:57 ` Alberto Alonso [not found] ` <471241F8.50205@harddata.com> 2 siblings, 1 reply; 23+ messages in thread From: Neil Brown @ 2007-10-13 22:50 UTC (permalink / raw) To: Alberto Alonso; +Cc: linux-raid On Saturday October 13, alberto@ggsys.net wrote: > Over the past several months I have encountered 3 > cases where the software RAID didn't work in keeping > the servers up and running. > > In all cases, the failure has been on a single drive, > yet the whole md device and server become unresponsive. > > (usb-storage) > In one situation a RAID 0 across 2 USB drives failed > when one of the drives accidentally got turned off. RAID0 is not true RAID - there is no redundancy. If one device in a RAID0 fails, the whole array will fail. This is expected. > > (sata) > A second case a disk started generating reports like: > end_request: I/O error, dev sdb, sector 42644555 So the drive had errors - not uncommon. What happened to the array? > > (sata) > The third case (which I'm living right now) is a disk > that I can see during the boot process but that I can't > get operations on it to come back (ie. fdisk -l /dev/sdc). You mean "fdisk -l /dev/sdc" just hangs? That sounds like a SATA driver error. You should report it to the SATA developers linux-ide@vger.kernel.org md/RAID cannot compensate for problems in the driver code. It expects every request that it sends down to either succeed or fail in a reasonable amount of time. > > (pata) > I have had at least 4 situations on old servers based > on pata disks where disk failures where successful in > being flagged and arrays where degraded automatically. Good! > > So, this is all making me wonder under what circumstances > software RAID may have problems detecting disk failures. RAID1, RAID10, RAID4, RAID5, RAID6 will handle errors that are correctly reported by the underlying device. > > I need to come up with a best practices solution and also > need to understand more as I move into raid over local > network (ie. iscsi, AoE or NBD). Could a disk failure in > one of the servers or a server going offline bring the > whole array down? It shouldn't, providing the low level driver is functioning correctly, and providing you are using true RAID (not RAID0 or LINEAR). NeilBrown ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-13 22:50 ` Neil Brown @ 2007-10-14 5:57 ` Alberto Alonso 2007-10-16 21:57 ` Mike Accetta 0 siblings, 1 reply; 23+ messages in thread From: Alberto Alonso @ 2007-10-14 5:57 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote: > On Saturday October 13, alberto@ggsys.net wrote: > > Over the past several months I have encountered 3 > > cases where the software RAID didn't work in keeping > > the servers up and running. > > > > In all cases, the failure has been on a single drive, > > yet the whole md device and server become unresponsive. > > > > (usb-storage) > > In one situation a RAID 0 across 2 USB drives failed > > when one of the drives accidentally got turned off. > > RAID0 is not true RAID - there is no redundancy. If one device in a > RAID0 fails, the whole array will fail. This is expected. Sorry, I meant RAID 1. Currently, we only use RAID 1 and RAID 5 on all our systems. > > > > > (sata) > > A second case a disk started generating reports like: > > end_request: I/O error, dev sdb, sector 42644555 > > So the drive had errors - not uncommon. What happened to the array? The array never became degraded, it just made the system hang. I reported it back in May, but couldn't get it resolved. I replaced the system and unfortunately went to a non-RAID solution for that server. > > > > (sata) > > The third case (which I'm living right now) is a disk > > that I can see during the boot process but that I can't > > get operations on it to come back (ie. fdisk -l /dev/sdc). > > You mean "fdisk -l /dev/sdc" just hangs? That sounds like a SATA > driver error. You should report it to the SATA developers > linux-ide@vger.kernel.org > > md/RAID cannot compensate for problems in the driver code. It expects > every request that it sends down to either succeed or fail in a > reasonable amount of time. Yes, that's exactly what happens. fdisk, dd or any other disk operation just hanged. I will report it there, thanks for the pointer. > > > > > (pata) > > I have had at least 4 situations on old servers based > > on pata disks where disk failures where successful in > > being flagged and arrays where degraded automatically. > > Good! Yep, after these results I stopped using hardware RAID. I went 100% software RAID on all systems other than a few SCSI hardware RAID systems that we bought as a set. Until this year that is, when I switched back to hardware RAID for our new critical systems due to the problems I saw back in May. > > > > > So, this is all making me wonder under what circumstances > > software RAID may have problems detecting disk failures. > > RAID1, RAID10, RAID4, RAID5, RAID6 will handle errors that are > correctly reported by the underlying device. Yep, that's what I always thought, I'm just surprised I had so many problems this year. It makes me wonder the reliability of the whole thing though. Even if it is an underlying layer, can the md code implement its own timeouts? > > > > > I need to come up with a best practices solution and also > > need to understand more as I move into raid over local > > network (ie. iscsi, AoE or NBD). Could a disk failure in > > one of the servers or a server going offline bring the > > whole array down? > > It shouldn't, providing the low level driver is functioning correctly, > and providing you are using true RAID (not RAID0 or LINEAR). > NeilBrown > - Sorry again for the RAID 0 mistake, I really did mean RAID 1. I guess that since I had 3 distinct servers crash this year on me I am getting paranoid. Is there a test suite or procedure that I can do to test for everything that can go wrong? You mentioned that the md code can not compensate for problems in the driver code. Couldn't some internal timeout mechanisms help? I can't no longer use software RAID on SATA for new production systems. I've switched to 3ware cards, but they are pricey and we really don't need them for most of our systems. I really would like to move to server clusters and RAID on the network devices for our larger arrays, but I need a way to properly test every scenario, as those are our critical servers and can not go down. I would like to figure out a "best practices procedure" that will ensure the correct degrading of the array upon a single failure, regardless of the underlying driver (ie. SATA, iSCSI, NBD, etc.) Am I thinking too much? Thanks, Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-14 5:57 ` Alberto Alonso @ 2007-10-16 21:57 ` Mike Accetta 2007-10-16 22:29 ` Richard Scobie ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Mike Accetta @ 2007-10-16 21:57 UTC (permalink / raw) To: Alberto Alonso; +Cc: Neil Brown, linux-raid Alberto Alonso writes: > On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote: > > On Saturday October 13, alberto@ggsys.net wrote: > > > Over the past several months I have encountered 3 > > > cases where the software RAID didn't work in keeping > > > the servers up and running. > > > > > > In all cases, the failure has been on a single drive, > > > yet the whole md device and server become unresponsive. > > > > > > (usb-storage) > > > In one situation a RAID 0 across 2 USB drives failed > > > when one of the drives accidentally got turned off. > > > > RAID0 is not true RAID - there is no redundancy. If one device in a > > RAID0 fails, the whole array will fail. This is expected. > > Sorry, I meant RAID 1. Currently, we only use RAID 1 and RAID 5 on all > our systems. > > > > > > > > > (sata) > > > A second case a disk started generating reports like: > > > end_request: I/O error, dev sdb, sector 42644555 > > > > So the drive had errors - not uncommon. What happened to the array? > > The array never became degraded, it just made the system > hang. I reported it back in May, but couldn't get it > resolved. I replaced the system and unfortunately went > to a non-RAID solution for that server. Was the disk driver generating any low level errors or otherwise indicating that it might be retrying operations on the bad drive at the time (i.e. console diagnostics)? As Neil mentioned later, the md layer is at the mercy of the low level disk driver. We've observed abysmal RAID1 recovery times on failing SATA disks because all the time is being spent in the driver retrying operations which will never succeed. Also, read errors don't tend to fail the array so when the bad disk is again accessed for some subsequent read the whole hopeless retry process begins anew. I posted a patch about 6 weeks ago which attempts to improve this situation for RAID1 by telling the driver not to retry on failures and giving some weight to read errors for failing the array. Hopefully, Neil is still mulling it over and it or something similar will eventually make it into the main line kernel as a solution for this problem. -- Mike Accetta ECI Telecom Ltd. Transport Networking Division, US (previously Laurel Networks) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-16 21:57 ` Mike Accetta @ 2007-10-16 22:29 ` Richard Scobie 2007-10-17 21:53 ` Support 2007-10-18 15:26 ` Goswin von Brederlow 2 siblings, 0 replies; 23+ messages in thread From: Richard Scobie @ 2007-10-16 22:29 UTC (permalink / raw) To: linux-raid Mike Accetta wrote: > is at the mercy of the low level disk driver. We've observed abysmal > RAID1 recovery times on failing SATA disks because all the time is > being spent in the driver retrying operations which will never succeed. > Also, read errors don't tend to fail the array so when the bad disk is > again accessed for some subsequent read the whole hopeless retry process > begins anew. This is one issue I believe the Western Digital RE/RE2 drives address. The "TLER (time-limited error recovery)" feature limits retrys to try to prevent this. I have the figure of 5 seconds in my head, but could be wrong. Not sure if the Seagate nearline range offers the same sort of thing. Regards, Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-16 21:57 ` Mike Accetta 2007-10-16 22:29 ` Richard Scobie @ 2007-10-17 21:53 ` Support 2007-10-18 15:26 ` Goswin von Brederlow 2 siblings, 0 replies; 23+ messages in thread From: Support @ 2007-10-17 21:53 UTC (permalink / raw) To: Mike Accetta; +Cc: Neil Brown, linux-raid On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote: > Was the disk driver generating any low level errors or otherwise > indicating that it might be retrying operations on the bad drive at > the time (i.e. console diagnostics)? As Neil mentioned later, the md layer > is at the mercy of the low level disk driver. We've observed abysmal > RAID1 recovery times on failing SATA disks because all the time is > being spent in the driver retrying operations which will never succeed. > Also, read errors don't tend to fail the array so when the bad disk is > again accessed for some subsequent read the whole hopeless retry process > begins anew. The console was full of errors like: end_request: I/O error, dev sdb, sector 42644555 I don't know what generates those messages. As I asked before but never got an answer, is there a way to do timeouts within the md code so that we are not at the mercy of the lower layer drivers? > > I posted a patch about 6 weeks ago which attempts to improve this situation > for RAID1 by telling the driver not to retry on failures and giving some > weight to read errors for failing the array. Hopefully, Neil is still > mulling it over and it or something similar will eventually make it into > the main line kernel as a solution for this problem. > -- > Mike Accetta > Thanks, Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-16 21:57 ` Mike Accetta 2007-10-16 22:29 ` Richard Scobie 2007-10-17 21:53 ` Support @ 2007-10-18 15:26 ` Goswin von Brederlow 2007-10-19 7:07 ` Alberto Alonso 2 siblings, 1 reply; 23+ messages in thread From: Goswin von Brederlow @ 2007-10-18 15:26 UTC (permalink / raw) To: Mike Accetta; +Cc: Alberto Alonso, Neil Brown, linux-raid Mike Accetta <maccetta@laurelnetworks.com> writes: > Also, read errors don't tend to fail the array so when the bad disk is > again accessed for some subsequent read the whole hopeless retry process > begins anew. > > I posted a patch about 6 weeks ago which attempts to improve this situation > for RAID1 by telling the driver not to retry on failures and giving some > weight to read errors for failing the array. Hopefully, Neil is still > mulling it over and it or something similar will eventually make it into > the main line kernel as a solution for this problem. What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1 second) then the request should be duplicated on the other mirror. If the first mirror later unchokes then it remains in the raid, if it fails it gets removed. But (at least reads) should not have to wait for that process. Even better would be if some write delay could also be used. The still working mirror would get an increase in its serial (so on reboot you know one disk is newer). If the choking mirror unchokes then it can write back all the delayed data and also increase its serial to match. Otherwise it gets really failed. But you might have to use bitmaps for this or the cache size would limit its usefullnes. MfG Goswin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-18 15:26 ` Goswin von Brederlow @ 2007-10-19 7:07 ` Alberto Alonso 2007-10-19 15:02 ` Justin Piszcz 2007-10-23 22:45 ` Bill Davidsen 0 siblings, 2 replies; 23+ messages in thread From: Alberto Alonso @ 2007-10-19 7:07 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Mike Accetta, Neil Brown, linux-raid On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: > Mike Accetta <maccetta@laurelnetworks.com> writes: > What I would like to see is a timeout driven fallback mechanism. If > one mirror does not return the requested data within a certain time > (say 1 second) then the request should be duplicated on the other > mirror. If the first mirror later unchokes then it remains in the > raid, if it fails it gets removed. But (at least reads) should not > have to wait for that process. > > Even better would be if some write delay could also be used. The still > working mirror would get an increase in its serial (so on reboot you > know one disk is newer). If the choking mirror unchokes then it can > write back all the delayed data and also increase its serial to > match. Otherwise it gets really failed. But you might have to use > bitmaps for this or the cache size would limit its usefullnes. > > MfG > Goswin I think a timeout on both: reads and writes is a must. Basically I believe that all problems that I've encountered issues using software raid would have been resolved by using a timeout within the md code. This will keep a server from crashing/hanging when the underlying driver doesn't properly handle hard drive problems. MD can be smarter than the "dumb" drivers. Just my thoughts though, as I've never got an answer as to whether or not md can implement its own timeouts. Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-19 7:07 ` Alberto Alonso @ 2007-10-19 15:02 ` Justin Piszcz 2007-10-20 13:45 ` Michael Tokarev 2007-10-26 16:11 ` Goswin von Brederlow 2007-10-23 22:45 ` Bill Davidsen 1 sibling, 2 replies; 23+ messages in thread From: Justin Piszcz @ 2007-10-19 15:02 UTC (permalink / raw) To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid On Fri, 19 Oct 2007, Alberto Alonso wrote: > On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: >> Mike Accetta <maccetta@laurelnetworks.com> writes: > >> What I would like to see is a timeout driven fallback mechanism. If >> one mirror does not return the requested data within a certain time >> (say 1 second) then the request should be duplicated on the other >> mirror. If the first mirror later unchokes then it remains in the >> raid, if it fails it gets removed. But (at least reads) should not >> have to wait for that process. >> >> Even better would be if some write delay could also be used. The still >> working mirror would get an increase in its serial (so on reboot you >> know one disk is newer). If the choking mirror unchokes then it can >> write back all the delayed data and also increase its serial to >> match. Otherwise it gets really failed. But you might have to use >> bitmaps for this or the cache size would limit its usefullnes. >> >> MfG >> Goswin > > I think a timeout on both: reads and writes is a must. Basically I > believe that all problems that I've encountered issues using software > raid would have been resolved by using a timeout within the md code. > > This will keep a server from crashing/hanging when the underlying > driver doesn't properly handle hard drive problems. MD can be > smarter than the "dumb" drivers. > > Just my thoughts though, as I've never got an answer as to whether or > not md can implement its own timeouts. > > Alberto > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > I have a question with re-mapping sectors, can software raid be as efficient or good at remapping bad sectors as an external raid controller for, e.g., raid 10 or raid5? Justin. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-19 15:02 ` Justin Piszcz @ 2007-10-20 13:45 ` Michael Tokarev 2007-10-20 13:55 ` Justin Piszcz 2007-10-26 16:11 ` Goswin von Brederlow 1 sibling, 1 reply; 23+ messages in thread From: Michael Tokarev @ 2007-10-20 13:45 UTC (permalink / raw) To: Justin Piszcz Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Justin Piszcz wrote: [] >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html Justin, forgive me please, but can you learn to trim the original messages when replying, at least cut off the very irrelevant parts? You're always quoting the whole message, even including the part after a line consiting of single minus sign "-" - a part that most MUAs will remove when replying... > I have a question with re-mapping sectors, can software raid be as > efficient or good at remapping bad sectors as an external raid > controller for, e.g., raid 10 or raid5? Hard disks ARE remapping bad sectors by their own. In most cases that's sufficient - there's nothing to do for raid (be it hardware raid or software) except of perform a write to the bad place, just to trigger an in-disk remapping procedure. Even the cheapest drives nowadays has some remapping capability. There was an idea some years ago about having an additional layer on between a block device and whatever else is above it (filesystem or something else), that will just do bad block remapping. Maybe it was even implemented in LVM or IBM-proposed EVMS (the version that included in-kernel stuff too, not only the userspace management), but I don't remember details anymore. In any case, - but again, if memory serves me right, -- there was low interest in that because of exactly this -- drives are now more intelligent, there's hardly a notion of "bad block" anymore, at least persistent bad block, -- at least visible to the upper layers. /mjt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-20 13:45 ` Michael Tokarev @ 2007-10-20 13:55 ` Justin Piszcz 0 siblings, 0 replies; 23+ messages in thread From: Justin Piszcz @ 2007-10-20 13:55 UTC (permalink / raw) To: Michael Tokarev Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid On Sat, 20 Oct 2007, Michael Tokarev wrote: > There was an idea some years ago about having an additional layer on > between a block device and whatever else is above it (filesystem or > something else), that will just do bad block remapping. Maybe it was > even implemented in LVM or IBM-proposed EVMS (the version that included > in-kernel stuff too, not only the userspace management), but I don't > remember details anymore. In any case, - but again, if memory serves > me right, -- there was low interest in that because of exactly this -- > drives are now more intelligent, there's hardly a notion of "bad block" > anymore, at least persistent bad block, -- at least visible to the > upper layers. > > /mjt > When I run 3dm2 (3ware 3dm2/tools/daemon) I often see LBA remapped sector, success, etc.. My question is, how come I do not see this with mdadm/software raid? Justin. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-19 15:02 ` Justin Piszcz 2007-10-20 13:45 ` Michael Tokarev @ 2007-10-26 16:11 ` Goswin von Brederlow 2007-10-26 16:11 ` Justin Piszcz 1 sibling, 1 reply; 23+ messages in thread From: Goswin von Brederlow @ 2007-10-26 16:11 UTC (permalink / raw) To: Justin Piszcz Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Justin Piszcz <jpiszcz@lucidpixels.com> writes: > On Fri, 19 Oct 2007, Alberto Alonso wrote: > >> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: >>> Mike Accetta <maccetta@laurelnetworks.com> writes: >> >>> What I would like to see is a timeout driven fallback mechanism. If >>> one mirror does not return the requested data within a certain time >>> (say 1 second) then the request should be duplicated on the other >>> mirror. If the first mirror later unchokes then it remains in the >>> raid, if it fails it gets removed. But (at least reads) should not >>> have to wait for that process. >>> >>> Even better would be if some write delay could also be used. The still >>> working mirror would get an increase in its serial (so on reboot you >>> know one disk is newer). If the choking mirror unchokes then it can >>> write back all the delayed data and also increase its serial to >>> match. Otherwise it gets really failed. But you might have to use >>> bitmaps for this or the cache size would limit its usefullnes. >>> >>> MfG >>> Goswin >> >> I think a timeout on both: reads and writes is a must. Basically I >> believe that all problems that I've encountered issues using software >> raid would have been resolved by using a timeout within the md code. >> >> This will keep a server from crashing/hanging when the underlying >> driver doesn't properly handle hard drive problems. MD can be >> smarter than the "dumb" drivers. >> >> Just my thoughts though, as I've never got an answer as to whether or >> not md can implement its own timeouts. >> >> Alberto >> >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > I have a question with re-mapping sectors, can software raid be as > efficient or good at remapping bad sectors as an external raid > controller for, e.g., raid 10 or raid5? > > Justin. Software raid makes no remapping of bad sectors at all. It assumes the disks will do sufficient remapping. MfG Goswin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-26 16:11 ` Goswin von Brederlow @ 2007-10-26 16:11 ` Justin Piszcz 0 siblings, 0 replies; 23+ messages in thread From: Justin Piszcz @ 2007-10-26 16:11 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Alberto Alonso, Mike Accetta, Neil Brown, linux-raid On Fri, 26 Oct 2007, Goswin von Brederlow wrote: > Justin Piszcz <jpiszcz@lucidpixels.com> writes: > >> On Fri, 19 Oct 2007, Alberto Alonso wrote: >> >>> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: >>>> Mike Accetta <maccetta@laurelnetworks.com> writes: >>> >>>> What I would like to see is a timeout driven fallback mechanism. If >>>> one mirror does not return the requested data within a certain time >>>> (say 1 second) then the request should be duplicated on the other >>>> mirror. If the first mirror later unchokes then it remains in the >>>> raid, if it fails it gets removed. But (at least reads) should not >>>> have to wait for that process. >>>> >>>> Even better would be if some write delay could also be used. The still >>>> working mirror would get an increase in its serial (so on reboot you >>>> know one disk is newer). If the choking mirror unchokes then it can >>>> write back all the delayed data and also increase its serial to >>>> match. Otherwise it gets really failed. But you might have to use >>>> bitmaps for this or the cache size would limit its usefullnes. >>>> >>>> MfG >>>> Goswin >>> >>> I think a timeout on both: reads and writes is a must. Basically I >>> believe that all problems that I've encountered issues using software >>> raid would have been resolved by using a timeout within the md code. >>> >>> This will keep a server from crashing/hanging when the underlying >>> driver doesn't properly handle hard drive problems. MD can be >>> smarter than the "dumb" drivers. >>> >>> Just my thoughts though, as I've never got an answer as to whether or >>> not md can implement its own timeouts. >>> >>> Alberto >>> >>> >>> - >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> I have a question with re-mapping sectors, can software raid be as >> efficient or good at remapping bad sectors as an external raid >> controller for, e.g., raid 10 or raid5? >> >> Justin. > > Software raid makes no remapping of bad sectors at all. It assumes the > disks will do sufficient remapping. > > MfG > Goswin > Thanks, this is what I was looking for. Justin. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-19 7:07 ` Alberto Alonso 2007-10-19 15:02 ` Justin Piszcz @ 2007-10-23 22:45 ` Bill Davidsen 2007-10-24 5:50 ` Alberto Alonso 1 sibling, 1 reply; 23+ messages in thread From: Bill Davidsen @ 2007-10-23 22:45 UTC (permalink / raw) To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Alberto Alonso wrote: > On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: > >> Mike Accetta <maccetta@laurelnetworks.com> writes: >> > > >> What I would like to see is a timeout driven fallback mechanism. If >> one mirror does not return the requested data within a certain time >> (say 1 second) then the request should be duplicated on the other >> mirror. If the first mirror later unchokes then it remains in the >> raid, if it fails it gets removed. But (at least reads) should not >> have to wait for that process. >> >> Even better would be if some write delay could also be used. The still >> working mirror would get an increase in its serial (so on reboot you >> know one disk is newer). If the choking mirror unchokes then it can >> write back all the delayed data and also increase its serial to >> match. Otherwise it gets really failed. But you might have to use >> bitmaps for this or the cache size would limit its usefullnes. >> >> MfG >> Goswin >> > > I think a timeout on both: reads and writes is a must. Basically I > believe that all problems that I've encountered issues using software > raid would have been resolved by using a timeout within the md code. > > This will keep a server from crashing/hanging when the underlying > driver doesn't properly handle hard drive problems. MD can be > smarter than the "dumb" drivers. > > Just my thoughts though, as I've never got an answer as to whether or > not md can implement its own timeouts. I'm not sure the timeouts are the problem, even if md did its own timeout, it then needs a way to tell the driver (or device) to stop retrying. I don't believe that's available, certainly not everywhere, and anything other than everywhere would turn the md code into a nest of exceptions. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-23 22:45 ` Bill Davidsen @ 2007-10-24 5:50 ` Alberto Alonso 2007-10-24 20:04 ` Bill Davidsen 0 siblings, 1 reply; 23+ messages in thread From: Alberto Alonso @ 2007-10-24 5:50 UTC (permalink / raw) To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: > I'm not sure the timeouts are the problem, even if md did its own > timeout, it then needs a way to tell the driver (or device) to stop > retrying. I don't believe that's available, certainly not everywhere, > and anything other than everywhere would turn the md code into a nest of > exceptions. > If we loose the ability to communication to that drive I don't see it as a problem (that's the whole point, we kick it out of the array). So, if we can't tell the driver about the failure we are still OK, md could successfully deal with misbehaved drivers. Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-24 5:50 ` Alberto Alonso @ 2007-10-24 20:04 ` Bill Davidsen 2007-10-24 20:18 ` Alberto Alonso 2007-10-26 16:12 ` Goswin von Brederlow 0 siblings, 2 replies; 23+ messages in thread From: Bill Davidsen @ 2007-10-24 20:04 UTC (permalink / raw) To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Alberto Alonso wrote: > On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: > > >> I'm not sure the timeouts are the problem, even if md did its own >> timeout, it then needs a way to tell the driver (or device) to stop >> retrying. I don't believe that's available, certainly not everywhere, >> and anything other than everywhere would turn the md code into a nest of >> exceptions. >> >> > > If we loose the ability to communication to that drive I don't see it > as a problem (that's the whole point, we kick it out of the array). So, > if we can't tell the driver about the failure we are still OK, md could > successfully deal with misbehaved drivers. I think what you really want is to notice how long the drive and driver took to recover or fail, and take action based on that. In general "kick the drive" is not optimal for a few bad spots, even if the drive recovery sucks. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-24 20:04 ` Bill Davidsen @ 2007-10-24 20:18 ` Alberto Alonso 2007-10-26 16:12 ` Goswin von Brederlow 1 sibling, 0 replies; 23+ messages in thread From: Alberto Alonso @ 2007-10-24 20:18 UTC (permalink / raw) To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote: > I think what you really want is to notice how long the drive and driver > took to recover or fail, and take action based on that. In general "kick > the drive" is not optimal for a few bad spots, even if the drive > recovery sucks. The problem is that the driver never comes back and the whole array hangs, waiting forever. That's why a timeout within the md code is needed to recover from these type of drivers. Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-24 20:04 ` Bill Davidsen 2007-10-24 20:18 ` Alberto Alonso @ 2007-10-26 16:12 ` Goswin von Brederlow 2007-10-26 17:09 ` Alberto Alonso 1 sibling, 1 reply; 23+ messages in thread From: Goswin von Brederlow @ 2007-10-26 16:12 UTC (permalink / raw) To: Bill Davidsen Cc: Alberto Alonso, Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Bill Davidsen <davidsen@tmr.com> writes: > Alberto Alonso wrote: >> On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: >> >> >>> I'm not sure the timeouts are the problem, even if md did its own >>> timeout, it then needs a way to tell the driver (or device) to stop >>> retrying. I don't believe that's available, certainly not >>> everywhere, and anything other than everywhere would turn the md >>> code into a nest of exceptions. >>> >>> >> >> If we loose the ability to communication to that drive I don't see it >> as a problem (that's the whole point, we kick it out of the array). So, >> if we can't tell the driver about the failure we are still OK, md could >> successfully deal with misbehaved drivers. > > I think what you really want is to notice how long the drive and > driver took to recover or fail, and take action based on that. In > general "kick the drive" is not optimal for a few bad spots, even if > the drive recovery sucks. Depending on the hardware you can still access a different disk while another one is reseting. But since there is no timeout in md it won't try to use any other disk while one is stuck. That is exactly what I miss. MfG Goswin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-26 16:12 ` Goswin von Brederlow @ 2007-10-26 17:09 ` Alberto Alonso 2007-10-27 15:26 ` Bill Davidsen 0 siblings, 1 reply; 23+ messages in thread From: Alberto Alonso @ 2007-10-26 17:09 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Bill Davidsen, Mike Accetta, Neil Brown, linux-raid On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > Depending on the hardware you can still access a different disk while > another one is reseting. But since there is no timeout in md it won't > try to use any other disk while one is stuck. > > That is exactly what I miss. > > MfG > Goswin > - That is exactly what I've been talking about. Can md implement timeouts and not just leave it to the drivers? I can't believe it but last night another array hit the dust when 1 of the 12 drives went bad. This year is just a nightmare for me. It brought all the network down until I was able to mark it failed and reboot to remove it from the array. Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-26 17:09 ` Alberto Alonso @ 2007-10-27 15:26 ` Bill Davidsen 2007-11-02 8:47 ` Alberto Alonso 0 siblings, 1 reply; 23+ messages in thread From: Bill Davidsen @ 2007-10-27 15:26 UTC (permalink / raw) To: Alberto Alonso; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid Alberto Alonso wrote: > On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > > >> Depending on the hardware you can still access a different disk while >> another one is reseting. But since there is no timeout in md it won't >> try to use any other disk while one is stuck. >> >> That is exactly what I miss. >> >> MfG >> Goswin >> - >> > > That is exactly what I've been talking about. Can md implement > timeouts and not just leave it to the drivers? > > I can't believe it but last night another array hit the dust when > 1 of the 12 drives went bad. This year is just a nightmare for > me. It brought all the network down until I was able to mark it > failed and reboot to remove it from the array. > I'm not sure what kind of drives and drivers you use, but I certainly have drives go bad and they get marked as failed. Both on old PATA drives and newer SATA. All the SCSI I currently use is on IBM hardware RAID (ServeRAID), so I can only assume that failure would be noted. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Software RAID when it works and when it doesn't 2007-10-27 15:26 ` Bill Davidsen @ 2007-11-02 8:47 ` Alberto Alonso 0 siblings, 0 replies; 23+ messages in thread From: Alberto Alonso @ 2007-11-02 8:47 UTC (permalink / raw) To: Bill Davidsen; +Cc: Goswin von Brederlow, Mike Accetta, Neil Brown, linux-raid On Sat, 2007-10-27 at 11:26 -0400, Bill Davidsen wrote: > Alberto Alonso wrote: > > On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > > > > > >> Depending on the hardware you can still access a different disk while > >> another one is reseting. But since there is no timeout in md it won't > >> try to use any other disk while one is stuck. > >> > >> That is exactly what I miss. > >> > >> MfG > >> Goswin > >> - > >> > > > > That is exactly what I've been talking about. Can md implement > > timeouts and not just leave it to the drivers? > > > > I can't believe it but last night another array hit the dust when > > 1 of the 12 drives went bad. This year is just a nightmare for > > me. It brought all the network down until I was able to mark it > > failed and reboot to remove it from the array. > > > > I'm not sure what kind of drives and drivers you use, but I certainly > have drives go bad and they get marked as failed. Both on old PATA > drives and newer SATA. All the SCSI I currently use is on IBM hardware > RAID (ServeRAID), so I can only assume that failure would be noted. > -- Alberto Alonso Global Gate Systems LLC. (512) 351-7233 http://www.ggsys.net Hardware, consulting, sysadmin, monitoring and remote backups ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <471241F8.50205@harddata.com>]
* Re: Software RAID when it works and when it doesn't [not found] ` <471241F8.50205@harddata.com> @ 2007-10-14 18:22 ` Alberto Alonso 0 siblings, 0 replies; 23+ messages in thread From: Alberto Alonso @ 2007-10-14 18:22 UTC (permalink / raw) To: Maurice Hilarius; +Cc: vger majordomo for lists On Sun, 2007-10-14 at 10:21 -0600, Maurice Hilarius wrote: > Alberto Alonso wrote: > > > PATA (IDE) with > Master and Slave drives is a "bad idea" as, when one drive fails, the > other of the Master & Slave pair often is no longer usable. > On discrete interfaces, with all drives configured as Master (single) > it is more tolerant. Before SATA became the de-facto we used promise PCI boards on top of the built in channels. As you mentioned, a single drive per channel is a must. We only had small servers with up to 6 PATA drives. This was proved to be really reliable and handled all disk failures without bringing the servers down. What I am trying to determine in these posts is a combination of hardware and software that will make software RAID a reliable solution when disks fail. > -- > With our best regards, > > Maurice W. Hilarius Telephone: 01-780-456-9771 > Hard Data Ltd. FAX: 01-780-456-9772 > 11060 - 166 Avenue email:maurice@harddata.com > Edmonton, AB, Canada http://www.harddata.com/ > T5X 1Y3 > Thanks, Alberto ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-11-02 8:47 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-13 18:40 Software RAID when it works and when it doesn't Alberto Alonso
2007-10-13 22:46 ` Eyal Lebedinsky
2007-10-13 22:50 ` Neil Brown
2007-10-14 5:57 ` Alberto Alonso
2007-10-16 21:57 ` Mike Accetta
2007-10-16 22:29 ` Richard Scobie
2007-10-17 21:53 ` Support
2007-10-18 15:26 ` Goswin von Brederlow
2007-10-19 7:07 ` Alberto Alonso
2007-10-19 15:02 ` Justin Piszcz
2007-10-20 13:45 ` Michael Tokarev
2007-10-20 13:55 ` Justin Piszcz
2007-10-26 16:11 ` Goswin von Brederlow
2007-10-26 16:11 ` Justin Piszcz
2007-10-23 22:45 ` Bill Davidsen
2007-10-24 5:50 ` Alberto Alonso
2007-10-24 20:04 ` Bill Davidsen
2007-10-24 20:18 ` Alberto Alonso
2007-10-26 16:12 ` Goswin von Brederlow
2007-10-26 17:09 ` Alberto Alonso
2007-10-27 15:26 ` Bill Davidsen
2007-11-02 8:47 ` Alberto Alonso
[not found] ` <471241F8.50205@harddata.com>
2007-10-14 18:22 ` Alberto Alonso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).