* raid5, media scans and stripe-wise resync @ 2004-10-25 15:36 David Mansfield 2004-10-25 17:19 ` Jure Pe_ar 2004-10-25 19:39 ` Bruce Lowekamp 0 siblings, 2 replies; 13+ messages in thread From: David Mansfield @ 2004-10-25 15:36 UTC (permalink / raw) To: linux-raid Hi everyone, After a few recent severe raid failures (one linux md, one 3ware), my understanding and fear about linux md is greatly increased. Single sector unrecoverable errors are doing us in! To alleviate these fears, we (my coworkers and I) believe we need to start a policy of conducting a 'background media scan' of the actual underlying physical devices in a raid 5. This is easily accomplished on the 3ware (it's built in), but we are struggling with linux md. A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will allow us to scan the media, and, if necessary, reassign the bad blocks. We have used this on scsi disks before, it seems to work, as a lowlevel tool. However! If two bad blocks are discovered on two different disks in the raid 5 (even if the bad blocks are in different stripes), we will be screwed, because the raid system will kick out the disk immediately when the first bad sector is found, and then reconstruction will fail when the second bad sector is found. screwed. Which brings me (finally) to my questions: 1) does linux md have a plan for integrating background media scanning and automatic sector reassignment like hardware solutions have? 2) how can we force (or manually perform) a stripe-wise resync? is it possible to take the raid offline completely, read the data with dd, compute the parity manually, reassign the bad block using SCU and rewrite the parity block with dd then put the raid online again? If #2 is possible, I'm sure a quick-and-dirty perl script could be created to do the work, which I'd be happy to do, if it's theoretically doable. Thanks, David ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield @ 2004-10-25 17:19 ` Jure Pe_ar 2004-10-25 19:43 ` David Mansfield 2004-10-25 19:39 ` Bruce Lowekamp 1 sibling, 1 reply; 13+ messages in thread From: Jure Pe_ar @ 2004-10-25 17:19 UTC (permalink / raw) To: David Mansfield; +Cc: linux-raid On Mon, 25 Oct 2004 11:36:33 -0400 David Mansfield <md@dm.cobite.com> wrote: > 2) how can we force (or manually perform) a stripe-wise resync? is it > possible to take the raid offline completely, read the data with dd, > compute the parity manually, reassign the bad block using SCU and > rewrite the parity block with dd then put the raid online again? In raid5 there's no real need for that. When you add disk back into array, it should get fully resynced anyway. I've written a short blurb in my blog about a rather rude method to handle misbehaving disks. Basically take it out of the array, run badblocks -w on it for a week and if it's ok, put it back :) -- Jure Pečar http://jure.pecar.org/ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 17:19 ` Jure Pe_ar @ 2004-10-25 19:43 ` David Mansfield 2004-10-25 20:29 ` Guy 0 siblings, 1 reply; 13+ messages in thread From: David Mansfield @ 2004-10-25 19:43 UTC (permalink / raw) To: Jure Pe_ar; +Cc: linux-raid On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote: > On Mon, 25 Oct 2004 11:36:33 -0400 > David Mansfield <md@dm.cobite.com> wrote: > > > 2) how can we force (or manually perform) a stripe-wise resync? is it > > possible to take the raid offline completely, read the data with dd, > > compute the parity manually, reassign the bad block using SCU and > > rewrite the parity block with dd then put the raid online again? > > In raid5 there's no real need for that. When you add disk back into array, > it should get fully resynced anyway. > Not quite. If disk 0 has a bad sector in stripe 0, and disk 1 has a bad sector in stripe 1, you will totally kill your array. It happens. It happened to us. Two bad sectors on two separate disks, but not on the same stripes. In a hardware raid solution, you would only die if both bad sectors were in the same stripe, because when it encounters the bad sector, it doesn't eject the disk from the array. It reassigns the bad block, and resyncs just that stripe. In the software situation, the entire disk will be ejected from the array after the first bad sector is detected. During resync, you will encounter the second bad sector (other drive), but because the information on the old disk 0 has been destroyed (the disk has been ejected from the array) your array is now dead. Does this make sense? > I've written a short blurb in my blog about a rather rude method to handle > misbehaving disks. Basically take it out of the array, run badblocks -w on > it for a week and if it's ok, put it back :) > Won't work if there are any bad sectors on any of the other disks. Even one other bad sector and your array is toast. David ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: raid5, media scans and stripe-wise resync 2004-10-25 19:43 ` David Mansfield @ 2004-10-25 20:29 ` Guy 2004-10-25 20:35 ` David Mansfield 2004-10-25 22:02 ` Konstantin Olchanski 0 siblings, 2 replies; 13+ messages in thread From: Guy @ 2004-10-25 20:29 UTC (permalink / raw) To: 'David Mansfield', 'Jure Pe_ar'; +Cc: linux-raid Someone said: "In a hardware raid solution, you would only die if both bad sectors were in the same stripe, because when it encounters the bad sector, it doesn't eject the disk from the array. It reassigns the bad block, and resyncs just that stripe." Is a hardware solution, if 1 disk has a bad sector and another disk fails, game over. The only way I know to avoid this is RAID6. I hope RAID6 becomes stable some day. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield Sent: Monday, October 25, 2004 3:43 PM To: Jure Pe_ar Cc: linux-raid@vger.kernel.org Subject: Re: raid5, media scans and stripe-wise resync On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote: > On Mon, 25 Oct 2004 11:36:33 -0400 > David Mansfield <md@dm.cobite.com> wrote: > > > 2) how can we force (or manually perform) a stripe-wise resync? is it > > possible to take the raid offline completely, read the data with dd, > > compute the parity manually, reassign the bad block using SCU and > > rewrite the parity block with dd then put the raid online again? > > In raid5 there's no real need for that. When you add disk back into array, > it should get fully resynced anyway. > Not quite. If disk 0 has a bad sector in stripe 0, and disk 1 has a bad sector in stripe 1, you will totally kill your array. It happens. It happened to us. Two bad sectors on two separate disks, but not on the same stripes. In a hardware raid solution, you would only die if both bad sectors were in the same stripe, because when it encounters the bad sector, it doesn't eject the disk from the array. It reassigns the bad block, and resyncs just that stripe. In the software situation, the entire disk will be ejected from the array after the first bad sector is detected. During resync, you will encounter the second bad sector (other drive), but because the information on the old disk 0 has been destroyed (the disk has been ejected from the array) your array is now dead. Does this make sense? > I've written a short blurb in my blog about a rather rude method to handle > misbehaving disks. Basically take it out of the array, run badblocks -w on > it for a week and if it's ok, put it back :) > Won't work if there are any bad sectors on any of the other disks. Even one other bad sector and your array is toast. David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: raid5, media scans and stripe-wise resync 2004-10-25 20:29 ` Guy @ 2004-10-25 20:35 ` David Mansfield 2004-10-25 20:48 ` Jure Pe_ar 2004-10-25 20:56 ` Guy 2004-10-25 22:02 ` Konstantin Olchanski 1 sibling, 2 replies; 13+ messages in thread From: David Mansfield @ 2004-10-25 20:35 UTC (permalink / raw) To: Guy; +Cc: 'Jure Pe_ar', linux-raid On Mon, 2004-10-25 at 16:29, Guy wrote: > Someone said: > "In a hardware raid solution, you would only die if both bad sectors were in > the same stripe, because when it encounters the bad sector, it doesn't eject > the disk from the array. It reassigns the bad block, and resyncs just that > stripe." > > Is a hardware solution, if 1 disk has a bad sector and another disk fails, > game over. The only way I know to avoid this is RAID6. I hope RAID6 > becomes stable some day. > This is true, but has nothing to do with what I'm talking about. Everyone is missing my point. The point is that NEITHER DRIVE 'FAILS'. They just have unrecoverable read errors, or bad sectors. As long as the two bad sectors are not in the same stripe, you have not lost any data (theoretically, for s/w and realistically for h/w). It is a FACT that if a h/w raid controller encounters a bad sector, it will *immediately* reassign it a reconstruct the stripe before moving on. If there are no other bad sectors in that stripe, you are FINE. Think about it. If later, (say 5 seconds later) another unrecoverable error is encountered on a different disk, different stripe, it will be handled fine, just as above. Compare this to the S/W raid where the entire disk is ejected from the array when the first bad sector is encountered. It cannot recover from the 'two bad sectors on two disks in two different stripes' failure scenario. H/W raid can. David > Guy > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield > Sent: Monday, October 25, 2004 3:43 PM > To: Jure Pe_ar > Cc: linux-raid@vger.kernel.org > Subject: Re: raid5, media scans and stripe-wise resync > > On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote: > > On Mon, 25 Oct 2004 11:36:33 -0400 > > David Mansfield <md@dm.cobite.com> wrote: > > > > > 2) how can we force (or manually perform) a stripe-wise resync? is it > > > possible to take the raid offline completely, read the data with dd, > > > compute the parity manually, reassign the bad block using SCU and > > > rewrite the parity block with dd then put the raid online again? > > > > In raid5 there's no real need for that. When you add disk back into array, > > it should get fully resynced anyway. > > > > Not quite. If disk 0 has a bad sector in stripe 0, and disk 1 has a bad > sector in stripe 1, you will totally kill your array. It happens. It > happened to us. Two bad sectors on two separate disks, but not on the > same stripes. > > In a hardware raid solution, you would only die if both bad sectors were > in the same stripe, because when it encounters the bad sector, it > doesn't eject the disk from the array. It reassigns the bad block, and > resyncs just that stripe. > > In the software situation, the entire disk will be ejected from the > array after the first bad sector is detected. During resync, you will > encounter the second bad sector (other drive), but because the > information on the old disk 0 has been destroyed (the disk has been > ejected from the array) your array is now dead. > > Does this make sense? > > > > I've written a short blurb in my blog about a rather rude method to handle > > misbehaving disks. Basically take it out of the array, run badblocks -w on > > it for a week and if it's ok, put it back :) > > > > Won't work if there are any bad sectors on any of the other disks. Even > one other bad sector and your array is toast. > > David > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 20:35 ` David Mansfield @ 2004-10-25 20:48 ` Jure Pe_ar 2004-10-25 21:09 ` David Mansfield 2004-10-25 20:56 ` Guy 1 sibling, 1 reply; 13+ messages in thread From: Jure Pe_ar @ 2004-10-25 20:48 UTC (permalink / raw) To: David Mansfield; +Cc: bugzilla, linux-raid On Mon, 25 Oct 2004 16:35:32 -0400 David Mansfield <md@dm.cobite.com> wrote: > The point is that NEITHER DRIVE 'FAILS'. They just have unrecoverable > read errors, or bad sectors. As long as the two bad sectors are not in > the same stripe, you have not lost any data (theoretically, for s/w and > realistically for h/w). As I see the problem, the definition of what is a "failed drive" is different from sysadmin's point of view and from md's point of view. Md freaks out on every single unrecoverable read error, but these usualy do not indicate a completely failed and dead drive. What needs to be done is to give md some more knowledge about disk errors, disk behaviour at dying and possibly integrate it in some way with smartd. This has been requested every now and then at least for the last three years, however nobody started to work on something like this, at least I'm not aware of any such activity. How exactly this could / should be acomplished is an interesting topic too. -- Jure Pečar http://jure.pecar.org/ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 20:48 ` Jure Pe_ar @ 2004-10-25 21:09 ` David Mansfield 0 siblings, 0 replies; 13+ messages in thread From: David Mansfield @ 2004-10-25 21:09 UTC (permalink / raw) To: Jure Pe_ar; +Cc: bugzilla, linux-raid On Mon, 2004-10-25 at 16:48, Jure Pe_ar wrote: > On Mon, 25 Oct 2004 16:35:32 -0400 > David Mansfield <md@dm.cobite.com> wrote: > > > The point is that NEITHER DRIVE 'FAILS'. They just have unrecoverable > > read errors, or bad sectors. As long as the two bad sectors are not in > > the same stripe, you have not lost any data (theoretically, for s/w and > > realistically for h/w). > > As I see the problem, the definition of what is a "failed drive" is > different from sysadmin's point of view and from md's point of view. > Exactly. md always kicks the entire drive out. Hardware raid doesn't take these extreme measures if a sector reassignment can take care of the problem immediately. In the md case, we are stuck with having to resync an entire disk, which is terrible if there is another sector on a different disk that is bad. > Md freaks out on every single unrecoverable read error, but these usualy do > not indicate a completely failed and dead drive. > In fact, very often an unrecoverable read error is an isolated defect. > What needs to be done is to give md some more knowledge about disk errors, > disk behaviour at dying and possibly integrate it in some way with smartd. > This has been requested every now and then at least for the last three > years, however nobody started to work on something like this, at least I'm > not aware of any such activity. > Ok. Thanks for the info. I tried googling and came up with nothing. > How exactly this could / should be acomplished is an interesting topic too. > If only I had an extra few months... David ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: raid5, media scans and stripe-wise resync 2004-10-25 20:35 ` David Mansfield 2004-10-25 20:48 ` Jure Pe_ar @ 2004-10-25 20:56 ` Guy 1 sibling, 0 replies; 13+ messages in thread From: Guy @ 2004-10-25 20:56 UTC (permalink / raw) To: 'David Mansfield'; +Cc: 'Jure Pe_ar', linux-raid I understand your point. The bad sector issue has been talked about here many times. Bad sectors have been a pain in the @$$ for me for 2 years. If you search the archives I am sure you would find a message from me with very similar concerns. I guess I was just pointing out the RAID6 will help. I think I have been lucky and have not had bad blocks on 2 disks at the same time (not sure). But I do understand that md can't deal with them. Marking a disk as failed when only 1 sector has failed is not a good solution. And yes, most (maybe all) hardware RAID systems "correct" bad sectors. Some count them and predict the drive is bad based on too many "corrected" bad sectors. EMC's big RAID systems copy the failing disk to a spare and place an auto service call. The failing disk is not taken out of service until it is physically replaced, since it still does have data and is working. By doing it this way the data is redundant during the whole process. Very clever. Sorry if I went off topic. Guy -----Original Message----- From: David Mansfield [mailto:md@dm.cobite.com] Sent: Monday, October 25, 2004 4:36 PM To: Guy Cc: 'Jure Pe_ar'; linux-raid@vger.kernel.org Subject: RE: raid5, media scans and stripe-wise resync On Mon, 2004-10-25 at 16:29, Guy wrote: > Someone said: > "In a hardware raid solution, you would only die if both bad sectors were in > the same stripe, because when it encounters the bad sector, it doesn't eject > the disk from the array. It reassigns the bad block, and resyncs just that > stripe." > > Is a hardware solution, if 1 disk has a bad sector and another disk fails, > game over. The only way I know to avoid this is RAID6. I hope RAID6 > becomes stable some day. > This is true, but has nothing to do with what I'm talking about. Everyone is missing my point. The point is that NEITHER DRIVE 'FAILS'. They just have unrecoverable read errors, or bad sectors. As long as the two bad sectors are not in the same stripe, you have not lost any data (theoretically, for s/w and realistically for h/w). It is a FACT that if a h/w raid controller encounters a bad sector, it will *immediately* reassign it a reconstruct the stripe before moving on. If there are no other bad sectors in that stripe, you are FINE. Think about it. If later, (say 5 seconds later) another unrecoverable error is encountered on a different disk, different stripe, it will be handled fine, just as above. Compare this to the S/W raid where the entire disk is ejected from the array when the first bad sector is encountered. It cannot recover from the 'two bad sectors on two disks in two different stripes' failure scenario. H/W raid can. David > Guy > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield > Sent: Monday, October 25, 2004 3:43 PM > To: Jure Pe_ar > Cc: linux-raid@vger.kernel.org > Subject: Re: raid5, media scans and stripe-wise resync > > On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote: > > On Mon, 25 Oct 2004 11:36:33 -0400 > > David Mansfield <md@dm.cobite.com> wrote: > > > > > 2) how can we force (or manually perform) a stripe-wise resync? is it > > > possible to take the raid offline completely, read the data with dd, > > > compute the parity manually, reassign the bad block using SCU and > > > rewrite the parity block with dd then put the raid online again? > > > > In raid5 there's no real need for that. When you add disk back into array, > > it should get fully resynced anyway. > > > > Not quite. If disk 0 has a bad sector in stripe 0, and disk 1 has a bad > sector in stripe 1, you will totally kill your array. It happens. It > happened to us. Two bad sectors on two separate disks, but not on the > same stripes. > > In a hardware raid solution, you would only die if both bad sectors were > in the same stripe, because when it encounters the bad sector, it > doesn't eject the disk from the array. It reassigns the bad block, and > resyncs just that stripe. > > In the software situation, the entire disk will be ejected from the > array after the first bad sector is detected. During resync, you will > encounter the second bad sector (other drive), but because the > information on the old disk 0 has been destroyed (the disk has been > ejected from the array) your array is now dead. > > Does this make sense? > > > > I've written a short blurb in my blog about a rather rude method to handle > > misbehaving disks. Basically take it out of the array, run badblocks -w on > > it for a week and if it's ok, put it back :) > > > > Won't work if there are any bad sectors on any of the other disks. Even > one other bad sector and your array is toast. > > David > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 20:29 ` Guy 2004-10-25 20:35 ` David Mansfield @ 2004-10-25 22:02 ` Konstantin Olchanski 2004-10-26 2:34 ` Guy 1 sibling, 1 reply; 13+ messages in thread From: Konstantin Olchanski @ 2004-10-25 22:02 UTC (permalink / raw) To: Guy; +Cc: 'David Mansfield', 'Jure Pe_ar', linux-raid On Mon, Oct 25, 2004 at 04:29:09PM -0400, anybody wrote: > 1 disk has a bad sector and another disk fails, game over. On a single-disk, 1 bad sector kills just the one unlucky file. On a degraded RAID0 or RAID5, 1 bad sector kills the filesystem. On a healthy RAID0 or RAID5, 2 bad sectors on different disks kill the filesystem. Does this make sense? RAID is less fault tolerant than a single disk? -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: raid5, media scans and stripe-wise resync 2004-10-25 22:02 ` Konstantin Olchanski @ 2004-10-26 2:34 ` Guy 0 siblings, 0 replies; 13+ messages in thread From: Guy @ 2004-10-26 2:34 UTC (permalink / raw) To: 'Konstantin Olchanski' Cc: 'David Mansfield', 'Jure Pe_ar', linux-raid I have a cron job that tests each disk once per day. This really helps. But I have still had md find a bad sector. But the risk of having 2 disks with bad sectors is very low if you test each night. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Konstantin Olchanski Sent: Monday, October 25, 2004 6:02 PM To: Guy Cc: 'David Mansfield'; 'Jure Pe_ar'; linux-raid@vger.kernel.org Subject: Re: raid5, media scans and stripe-wise resync On Mon, Oct 25, 2004 at 04:29:09PM -0400, anybody wrote: > 1 disk has a bad sector and another disk fails, game over. On a single-disk, 1 bad sector kills just the one unlucky file. On a degraded RAID0 or RAID5, 1 bad sector kills the filesystem. On a healthy RAID0 or RAID5, 2 bad sectors on different disks kill the filesystem. Does this make sense? RAID is less fault tolerant than a single disk? -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield 2004-10-25 17:19 ` Jure Pe_ar @ 2004-10-25 19:39 ` Bruce Lowekamp 2004-10-25 19:47 ` David Mansfield 2004-10-26 9:56 ` berk walker 1 sibling, 2 replies; 13+ messages in thread From: Bruce Lowekamp @ 2004-10-25 19:39 UTC (permalink / raw) To: David Mansfield; +Cc: linux-raid There was a recent conversation on this mailing list about transparently recovering from read errors (essentially just rewriting the bad stripe and letting the disk handle it), but I think it focused on Raid 1. It would be a natural for Raid 5 or 6, but I haven't seen an experimental patch to do that. If you just want to monitor, look at http://smartmontools.sourceforge.net each of the drives in my array has a montoring config: /dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m lowekamp@cs.wm.edu two weeks ago I got email that one disk had a bad read on a sector during its weekly long scan (an entire surface scan). I failed that drive manually, waited until it resynced on the spare, overwrote the entire drive to let the drive clear the sector (and make sure there weren't any other problems), then reran the test and set that drive as the spare. I'd still feel safer if it automatically overwrote only the sector with the read error, but at least this way I knew that the other 9 drives had passed a surface scan just before, so I wasn't likely to run into a second read failure on rebuild. Bruce On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote: > Hi everyone, > > After a few recent severe raid failures (one linux md, one 3ware), my > understanding and fear about linux md is greatly increased. Single > sector unrecoverable errors are doing us in! > > To alleviate these fears, we (my coworkers and I) believe we need to > start a policy of conducting a 'background media scan' of the actual > underlying physical devices in a raid 5. This is easily accomplished on > the 3ware (it's built in), but we are struggling with linux md. > > A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will > allow us to scan the media, and, if necessary, reassign the bad blocks. > We have used this on scsi disks before, it seems to work, as a lowlevel > tool. > > However! If two bad blocks are discovered on two different disks in the > raid 5 (even if the bad blocks are in different stripes), we will be > screwed, because the raid system will kick out the disk immediately when > the first bad sector is found, and then reconstruction will fail when > the second bad sector is found. screwed. > > Which brings me (finally) to my questions: > > 1) does linux md have a plan for integrating background media scanning > and automatic sector reassignment like hardware solutions have? > > 2) how can we force (or manually perform) a stripe-wise resync? is it > possible to take the raid offline completely, read the data with dd, > compute the parity manually, reassign the bad block using SCU and > rewrite the parity block with dd then put the raid online again? > > If #2 is possible, I'm sure a quick-and-dirty perl script could be > created to do the work, which I'd be happy to do, if it's theoretically > doable. > > Thanks, > David > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Bruce Lowekamp (lowekamp@cs.wm.edu) Computer Science Dept, College of William and Mary ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 19:39 ` Bruce Lowekamp @ 2004-10-25 19:47 ` David Mansfield 2004-10-26 9:56 ` berk walker 1 sibling, 0 replies; 13+ messages in thread From: David Mansfield @ 2004-10-25 19:47 UTC (permalink / raw) To: Bruce Lowekamp; +Cc: linux-raid On Mon, 2004-10-25 at 15:39, Bruce Lowekamp wrote: > There was a recent conversation on this mailing list about > transparently recovering from read errors (essentially just rewriting > the bad stripe and letting the disk handle it), but I think it focused > on Raid 1. It would be a natural for Raid 5 or 6, but I haven't seen > an experimental patch to do that. > > If you just want to monitor, look at http://smartmontools.sourceforge.net > each of the drives in my array has a montoring config: > /dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m > lowekamp@cs.wm.edu > Thanks for the reference. > two weeks ago I got email that one disk had a bad read on a sector > during its weekly long scan (an entire surface scan). I failed that > drive manually, waited until it resynced on the spare, overwrote the > entire drive to let the drive clear the sector (and make sure there > weren't any other problems), then reran the test and set that drive as > the spare. > Check out the utility 'scu' at the url: http://www.bit-net.com/%7Ermiller/scu.html It will allow you to 'reassign' the block directly by accessing the scsi commands. I've tried the rewrite method you used above, and once or twice had problems. > I'd still feel safer if it automatically overwrote only the sector > with the read error, but at least this way I knew that the other 9 > drives had passed a surface scan just before, so I wasn't likely to > run into a second read failure on rebuild. > Yeah. After scanning all disks you are reasonably assured. But should it happen that there are two defects, you are completely screwed. No way around it, I think. I'd really like a way to resync a single stripe... David ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: raid5, media scans and stripe-wise resync 2004-10-25 19:39 ` Bruce Lowekamp 2004-10-25 19:47 ` David Mansfield @ 2004-10-26 9:56 ` berk walker 1 sibling, 0 replies; 13+ messages in thread From: berk walker @ 2004-10-26 9:56 UTC (permalink / raw) Cc: linux-raid One problem with doing a surface scan which writes and reads back the data is that in the event of weak/worn media, the data can appear to be good, but degrade quickly (mag fields go soft). Just my own 2 cents, but the sick fella should be shot and buried immediately, no second chances. b- Bruce Lowekamp wrote: >There was a recent conversation on this mailing list about >transparently recovering from read errors (essentially just rewriting >the bad stripe and letting the disk handle it), but I think it focused >on Raid 1. It would be a natural for Raid 5 or 6, but I haven't seen >an experimental patch to do that. > >If you just want to monitor, look at http://smartmontools.sourceforge.net >each of the drives in my array has a montoring config: >/dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m >lowekamp@cs.wm.edu > >two weeks ago I got email that one disk had a bad read on a sector >during its weekly long scan (an entire surface scan). I failed that >drive manually, waited until it resynced on the spare, overwrote the >entire drive to let the drive clear the sector (and make sure there >weren't any other problems), then reran the test and set that drive as >the spare. > >I'd still feel safer if it automatically overwrote only the sector >with the read error, but at least this way I knew that the other 9 >drives had passed a surface scan just before, so I wasn't likely to >run into a second read failure on rebuild. > >Bruce > > >On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote: > > >>Hi everyone, >> >>After a few recent severe raid failures (one linux md, one 3ware), my >>understanding and fear about linux md is greatly increased. Single >>sector unrecoverable errors are doing us in! >> >>To alleviate these fears, we (my coworkers and I) believe we need to >>start a policy of conducting a 'background media scan' of the actual >>underlying physical devices in a raid 5. This is easily accomplished on >>the 3ware (it's built in), but we are struggling with linux md. >> >>A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will >>allow us to scan the media, and, if necessary, reassign the bad blocks. >>We have used this on scsi disks before, it seems to work, as a lowlevel >>tool. >> >>However! If two bad blocks are discovered on two different disks in the >>raid 5 (even if the bad blocks are in different stripes), we will be >>screwed, because the raid system will kick out the disk immediately when >>the first bad sector is found, and then reconstruction will fail when >>the second bad sector is found. screwed. >> >>Which brings me (finally) to my questions: >> >>1) does linux md have a plan for integrating background media scanning >>and automatic sector reassignment like hardware solutions have? >> >>2) how can we force (or manually perform) a stripe-wise resync? is it >>possible to take the raid offline completely, read the data with dd, >>compute the parity manually, reassign the bad block using SCU and >>rewrite the parity block with dd then put the raid online again? >> >>If #2 is possible, I'm sure a quick-and-dirty perl script could be >>created to do the work, which I'd be happy to do, if it's theoretically >>doable. >> >>Thanks, >>David >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-10-26 9:56 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield 2004-10-25 17:19 ` Jure Pe_ar 2004-10-25 19:43 ` David Mansfield 2004-10-25 20:29 ` Guy 2004-10-25 20:35 ` David Mansfield 2004-10-25 20:48 ` Jure Pe_ar 2004-10-25 21:09 ` David Mansfield 2004-10-25 20:56 ` Guy 2004-10-25 22:02 ` Konstantin Olchanski 2004-10-26 2:34 ` Guy 2004-10-25 19:39 ` Bruce Lowekamp 2004-10-25 19:47 ` David Mansfield 2004-10-26 9:56 ` berk walker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).