raid5, media scans and stripe-wise resync

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid5, media scans and stripe-wise resync
@ 2004-10-25 15:36 David Mansfield
  2004-10-25 17:19 ` Jure Pe_ar
  2004-10-25 19:39 ` Bruce Lowekamp
  0 siblings, 2 replies; 13+ messages in thread
From: David Mansfield @ 2004-10-25 15:36 UTC (permalink / raw)
  To: linux-raid

Hi everyone,

After a few recent severe raid failures (one linux md, one 3ware), my
understanding and fear about linux md is greatly increased.  Single
sector unrecoverable errors are doing us in!

To alleviate these fears, we (my coworkers and I) believe we need to
start a policy of conducting a 'background media scan' of the actual
underlying physical devices in a raid 5.  This is easily accomplished on
the 3ware (it's built in), but we are struggling with linux md.

A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will
allow us to scan the media, and, if necessary, reassign the bad blocks. 
We have used this on scsi disks before, it seems to work, as a lowlevel
tool.

However! If two bad blocks are discovered on two different disks in the
raid 5 (even if the bad blocks are in different stripes), we will be
screwed, because the raid system will kick out the disk immediately when
the first bad sector is found, and then reconstruction will fail when
the second bad sector is found.  screwed.

Which brings me (finally) to my questions:

1) does linux md have a plan for integrating background media scanning
and automatic sector reassignment like hardware solutions have?

2) how can we force (or manually perform) a stripe-wise resync? is it
possible to take the raid offline completely, read the data with dd,
compute the parity manually, reassign the bad block using SCU and
rewrite the parity block with dd then put the raid online again?

If #2 is possible, I'm sure a quick-and-dirty perl script could be
created to do the work, which I'd be happy to do, if it's theoretically
doable.

Thanks,
David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield
@ 2004-10-25 17:19 ` Jure Pe_ar
  2004-10-25 19:43   ` David Mansfield
  2004-10-25 19:39 ` Bruce Lowekamp
  1 sibling, 1 reply; 13+ messages in thread
From: Jure Pe_ar @ 2004-10-25 17:19 UTC (permalink / raw)
  To: David Mansfield; +Cc: linux-raid

On Mon, 25 Oct 2004 11:36:33 -0400
David Mansfield <md@dm.cobite.com> wrote:

> 2) how can we force (or manually perform) a stripe-wise resync? is it
> possible to take the raid offline completely, read the data with dd,
> compute the parity manually, reassign the bad block using SCU and
> rewrite the parity block with dd then put the raid online again?

In raid5 there's no real need for that. When you add disk back into array,
it should get fully resynced anyway.

I've written a short blurb in my blog about a rather rude method to handle
misbehaving disks. Basically take it out of the array, run badblocks -w on
it for a week and if it's ok, put it back :)


-- 

Jure Pečar
http://jure.pecar.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 17:19 ` Jure Pe_ar
@ 2004-10-25 19:43   ` David Mansfield
  2004-10-25 20:29     ` Guy
  0 siblings, 1 reply; 13+ messages in thread
From: David Mansfield @ 2004-10-25 19:43 UTC (permalink / raw)
  To: Jure Pe_ar; +Cc: linux-raid

On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote:
> On Mon, 25 Oct 2004 11:36:33 -0400
> David Mansfield <md@dm.cobite.com> wrote:
> 
> > 2) how can we force (or manually perform) a stripe-wise resync? is it
> > possible to take the raid offline completely, read the data with dd,
> > compute the parity manually, reassign the bad block using SCU and
> > rewrite the parity block with dd then put the raid online again?
> 
> In raid5 there's no real need for that. When you add disk back into array,
> it should get fully resynced anyway.
> 

Not quite.  If disk 0 has a bad sector in stripe 0, and disk 1 has a bad
sector in stripe 1, you will totally kill your array.  It happens.  It
happened to us.  Two bad sectors on two separate disks, but not on the
same stripes.

In a hardware raid solution, you would only die if both bad sectors were
in the same stripe, because when it encounters the bad sector, it
doesn't eject the disk from the array.  It reassigns the bad block, and
resyncs just that stripe.

In the software situation, the entire disk will be ejected from the
array after the first bad sector is detected.  During resync, you will
encounter the second bad sector (other drive), but because the
information on the old disk 0 has been destroyed (the disk has been
ejected from the array) your array is now dead.  

Does this make sense?

> I've written a short blurb in my blog about a rather rude method to handle
> misbehaving disks. Basically take it out of the array, run badblocks -w on
> it for a week and if it's ok, put it back :)
> 

Won't work if there are any bad sectors on any of the other disks.  Even
one other bad sector and your array is toast.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: raid5, media scans and stripe-wise resync
  2004-10-25 19:43   ` David Mansfield
@ 2004-10-25 20:29     ` Guy
  2004-10-25 20:35       ` David Mansfield
  2004-10-25 22:02       ` Konstantin Olchanski
  0 siblings, 2 replies; 13+ messages in thread
From: Guy @ 2004-10-25 20:29 UTC (permalink / raw)
  To: 'David Mansfield', 'Jure Pe_ar'; +Cc: linux-raid

Someone said:
"In a hardware raid solution, you would only die if both bad sectors were in
the same stripe, because when it encounters the bad sector, it doesn't eject
the disk from the array.  It reassigns the bad block, and resyncs just that
stripe."

Is a hardware solution, if 1 disk has a bad sector and another disk fails,
game over.  The only way I know to avoid this is RAID6.  I hope RAID6
becomes stable some day.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield
Sent: Monday, October 25, 2004 3:43 PM
To: Jure Pe_ar
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5, media scans and stripe-wise resync

On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote:
> On Mon, 25 Oct 2004 11:36:33 -0400
> David Mansfield <md@dm.cobite.com> wrote:
> 
> > 2) how can we force (or manually perform) a stripe-wise resync? is it
> > possible to take the raid offline completely, read the data with dd,
> > compute the parity manually, reassign the bad block using SCU and
> > rewrite the parity block with dd then put the raid online again?
> 
> In raid5 there's no real need for that. When you add disk back into array,
> it should get fully resynced anyway.
> 

Not quite.  If disk 0 has a bad sector in stripe 0, and disk 1 has a bad
sector in stripe 1, you will totally kill your array.  It happens.  It
happened to us.  Two bad sectors on two separate disks, but not on the
same stripes.

In a hardware raid solution, you would only die if both bad sectors were
in the same stripe, because when it encounters the bad sector, it
doesn't eject the disk from the array.  It reassigns the bad block, and
resyncs just that stripe.

In the software situation, the entire disk will be ejected from the
array after the first bad sector is detected.  During resync, you will
encounter the second bad sector (other drive), but because the
information on the old disk 0 has been destroyed (the disk has been
ejected from the array) your array is now dead.  

Does this make sense?

> I've written a short blurb in my blog about a rather rude method to handle
> misbehaving disks. Basically take it out of the array, run badblocks -w on
> it for a week and if it's ok, put it back :)
> 

Won't work if there are any bad sectors on any of the other disks.  Even
one other bad sector and your array is toast.

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: raid5, media scans and stripe-wise resync
  2004-10-25 20:29     ` Guy
@ 2004-10-25 20:35       ` David Mansfield
  2004-10-25 20:48         ` Jure Pe_ar
  2004-10-25 20:56         ` Guy
  2004-10-25 22:02       ` Konstantin Olchanski
  1 sibling, 2 replies; 13+ messages in thread
From: David Mansfield @ 2004-10-25 20:35 UTC (permalink / raw)
  To: Guy; +Cc: 'Jure Pe_ar', linux-raid

On Mon, 2004-10-25 at 16:29, Guy wrote:
> Someone said:
> "In a hardware raid solution, you would only die if both bad sectors were in
> the same stripe, because when it encounters the bad sector, it doesn't eject
> the disk from the array.  It reassigns the bad block, and resyncs just that
> stripe."
> 
> Is a hardware solution, if 1 disk has a bad sector and another disk fails,
> game over.  The only way I know to avoid this is RAID6.  I hope RAID6
> becomes stable some day.
> 

This is true, but has nothing to do with what I'm talking about. 
Everyone is missing my point.

The point is that NEITHER DRIVE 'FAILS'.  They just have unrecoverable
read errors, or bad sectors.  As long as the two bad sectors are not in
the same stripe, you have not lost any data (theoretically, for s/w and
realistically for h/w).

It is a FACT that if a h/w raid controller encounters a bad sector, it
will *immediately* reassign it a reconstruct the stripe before moving
on.  If there are no other bad sectors in that stripe, you are FINE. 
Think about it.  If later, (say 5 seconds later) another unrecoverable
error is encountered on a different disk, different stripe, it will be
handled fine, just as above.

Compare this to the S/W raid where the entire disk is ejected from the
array when the first bad sector is encountered.  It cannot recover from
the 'two bad sectors on two disks in two different stripes' failure
scenario.  H/W raid can.

David


> Guy
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield
> Sent: Monday, October 25, 2004 3:43 PM
> To: Jure Pe_ar
> Cc: linux-raid@vger.kernel.org
> Subject: Re: raid5, media scans and stripe-wise resync
> 
> On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote:
> > On Mon, 25 Oct 2004 11:36:33 -0400
> > David Mansfield <md@dm.cobite.com> wrote:
> > 
> > > 2) how can we force (or manually perform) a stripe-wise resync? is it
> > > possible to take the raid offline completely, read the data with dd,
> > > compute the parity manually, reassign the bad block using SCU and
> > > rewrite the parity block with dd then put the raid online again?
> > 
> > In raid5 there's no real need for that. When you add disk back into array,
> > it should get fully resynced anyway.
> > 
> 
> Not quite.  If disk 0 has a bad sector in stripe 0, and disk 1 has a bad
> sector in stripe 1, you will totally kill your array.  It happens.  It
> happened to us.  Two bad sectors on two separate disks, but not on the
> same stripes.
> 
> In a hardware raid solution, you would only die if both bad sectors were
> in the same stripe, because when it encounters the bad sector, it
> doesn't eject the disk from the array.  It reassigns the bad block, and
> resyncs just that stripe.
> 
> In the software situation, the entire disk will be ejected from the
> array after the first bad sector is detected.  During resync, you will
> encounter the second bad sector (other drive), but because the
> information on the old disk 0 has been destroyed (the disk has been
> ejected from the array) your array is now dead.  
> 
> Does this make sense?
> 
> 
> > I've written a short blurb in my blog about a rather rude method to handle
> > misbehaving disks. Basically take it out of the array, run badblocks -w on
> > it for a week and if it's ok, put it back :)
> > 
> 
> Won't work if there are any bad sectors on any of the other disks.  Even
> one other bad sector and your array is toast.
> 
> David
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 20:35       ` David Mansfield
@ 2004-10-25 20:48         ` Jure Pe_ar
  2004-10-25 21:09           ` David Mansfield
  2004-10-25 20:56         ` Guy
  1 sibling, 1 reply; 13+ messages in thread
From: Jure Pe_ar @ 2004-10-25 20:48 UTC (permalink / raw)
  To: David Mansfield; +Cc: bugzilla, linux-raid

On Mon, 25 Oct 2004 16:35:32 -0400
David Mansfield <md@dm.cobite.com> wrote:

> The point is that NEITHER DRIVE 'FAILS'.  They just have unrecoverable
> read errors, or bad sectors.  As long as the two bad sectors are not in
> the same stripe, you have not lost any data (theoretically, for s/w and
> realistically for h/w).

As I see the problem, the definition of what is a "failed drive" is
different from sysadmin's point of view and from md's point of view.

Md freaks out on every single unrecoverable read error, but these usualy do
not indicate a completely failed and dead drive. 

What needs to be done is to give md some more knowledge about disk errors,
disk behaviour at dying and possibly integrate it in some way with smartd.
This has been requested every now and then at least for the last three
years, however nobody started to work on something like this, at least I'm
not aware of any such activity.

How exactly this could / should be acomplished is an interesting topic too.

-- 

Jure Pečar
http://jure.pecar.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 20:48         ` Jure Pe_ar
@ 2004-10-25 21:09           ` David Mansfield
  0 siblings, 0 replies; 13+ messages in thread
From: David Mansfield @ 2004-10-25 21:09 UTC (permalink / raw)
  To: Jure Pe_ar; +Cc: bugzilla, linux-raid

On Mon, 2004-10-25 at 16:48, Jure Pe_ar wrote:
> On Mon, 25 Oct 2004 16:35:32 -0400
> David Mansfield <md@dm.cobite.com> wrote:
> 
> > The point is that NEITHER DRIVE 'FAILS'.  They just have unrecoverable
> > read errors, or bad sectors.  As long as the two bad sectors are not in
> > the same stripe, you have not lost any data (theoretically, for s/w and
> > realistically for h/w).
> 
> As I see the problem, the definition of what is a "failed drive" is
> different from sysadmin's point of view and from md's point of view.
> 

Exactly.  md always kicks the entire drive out.  Hardware raid doesn't
take these extreme measures if a sector reassignment can take care of
the problem immediately.  

In the md case, we are stuck with having to resync an entire disk, which
is terrible if there is another sector on a different disk that is bad.


> Md freaks out on every single unrecoverable read error, but these usualy do
> not indicate a completely failed and dead drive. 
> 

In fact, very often an unrecoverable read error is an isolated defect.

> What needs to be done is to give md some more knowledge about disk errors,
> disk behaviour at dying and possibly integrate it in some way with smartd.
> This has been requested every now and then at least for the last three
> years, however nobody started to work on something like this, at least I'm
> not aware of any such activity.
> 

Ok.  Thanks for the info.  I tried googling and came up with nothing.

> How exactly this could / should be acomplished is an interesting topic too.
> 

If only I had an extra few months...

David



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: raid5, media scans and stripe-wise resync
  2004-10-25 20:35       ` David Mansfield
  2004-10-25 20:48         ` Jure Pe_ar
@ 2004-10-25 20:56         ` Guy
  1 sibling, 0 replies; 13+ messages in thread
From: Guy @ 2004-10-25 20:56 UTC (permalink / raw)
  To: 'David Mansfield'; +Cc: 'Jure Pe_ar', linux-raid

I understand your point.  The bad sector issue has been talked about here
many times.  Bad sectors have been a pain in the @$$ for me for 2 years.  If
you search the archives I am sure you would find a message from me with very
similar concerns.  I guess I was just pointing out the RAID6 will help.  I
think I have been lucky and have not had bad blocks on 2 disks at the same
time (not sure).  But I do understand that md can't deal with them.  Marking
a disk as failed when only 1 sector has failed is not a good solution.

And yes, most (maybe all) hardware RAID systems "correct" bad sectors.  Some
count them and predict the drive is bad based on too many "corrected" bad
sectors.  EMC's big RAID systems copy the failing disk to a spare and place
an auto service call.  The failing disk is not taken out of service until it
is physically replaced, since it still does have data and is working.  By
doing it this way the data is redundant during the whole process.  Very
clever.

Sorry if I went off topic.

Guy

-----Original Message-----
From: David Mansfield [mailto:md@dm.cobite.com] 
Sent: Monday, October 25, 2004 4:36 PM
To: Guy
Cc: 'Jure Pe_ar'; linux-raid@vger.kernel.org
Subject: RE: raid5, media scans and stripe-wise resync

On Mon, 2004-10-25 at 16:29, Guy wrote:
> Someone said:
> "In a hardware raid solution, you would only die if both bad sectors were
in
> the same stripe, because when it encounters the bad sector, it doesn't
eject
> the disk from the array.  It reassigns the bad block, and resyncs just
that
> stripe."
> 
> Is a hardware solution, if 1 disk has a bad sector and another disk fails,
> game over.  The only way I know to avoid this is RAID6.  I hope RAID6
> becomes stable some day.
> 

This is true, but has nothing to do with what I'm talking about. 
Everyone is missing my point.

The point is that NEITHER DRIVE 'FAILS'.  They just have unrecoverable
read errors, or bad sectors.  As long as the two bad sectors are not in
the same stripe, you have not lost any data (theoretically, for s/w and
realistically for h/w).

It is a FACT that if a h/w raid controller encounters a bad sector, it
will *immediately* reassign it a reconstruct the stripe before moving
on.  If there are no other bad sectors in that stripe, you are FINE. 
Think about it.  If later, (say 5 seconds later) another unrecoverable
error is encountered on a different disk, different stripe, it will be
handled fine, just as above.

Compare this to the S/W raid where the entire disk is ejected from the
array when the first bad sector is encountered.  It cannot recover from
the 'two bad sectors on two disks in two different stripes' failure
scenario.  H/W raid can.

David

> Guy
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Mansfield
> Sent: Monday, October 25, 2004 3:43 PM
> To: Jure Pe_ar
> Cc: linux-raid@vger.kernel.org
> Subject: Re: raid5, media scans and stripe-wise resync
> 
> On Mon, 2004-10-25 at 13:19, Jure Pe_ar wrote:
> > On Mon, 25 Oct 2004 11:36:33 -0400
> > David Mansfield <md@dm.cobite.com> wrote:
> > 
> > > 2) how can we force (or manually perform) a stripe-wise resync? is it
> > > possible to take the raid offline completely, read the data with dd,
> > > compute the parity manually, reassign the bad block using SCU and
> > > rewrite the parity block with dd then put the raid online again?
> > 
> > In raid5 there's no real need for that. When you add disk back into
array,
> > it should get fully resynced anyway.
> > 
> 
> Not quite.  If disk 0 has a bad sector in stripe 0, and disk 1 has a bad
> sector in stripe 1, you will totally kill your array.  It happens.  It
> happened to us.  Two bad sectors on two separate disks, but not on the
> same stripes.
> 
> In a hardware raid solution, you would only die if both bad sectors were
> in the same stripe, because when it encounters the bad sector, it
> doesn't eject the disk from the array.  It reassigns the bad block, and
> resyncs just that stripe.
> 
> In the software situation, the entire disk will be ejected from the
> array after the first bad sector is detected.  During resync, you will
> encounter the second bad sector (other drive), but because the
> information on the old disk 0 has been destroyed (the disk has been
> ejected from the array) your array is now dead.  
> 
> Does this make sense?
> 
> 
> > I've written a short blurb in my blog about a rather rude method to
handle
> > misbehaving disks. Basically take it out of the array, run badblocks -w
on
> > it for a week and if it's ok, put it back :)
> > 
> 
> Won't work if there are any bad sectors on any of the other disks.  Even
> one other bad sector and your array is toast.
> 
> David
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 20:29     ` Guy
  2004-10-25 20:35       ` David Mansfield
@ 2004-10-25 22:02       ` Konstantin Olchanski
  2004-10-26  2:34         ` Guy
  1 sibling, 1 reply; 13+ messages in thread
From: Konstantin Olchanski @ 2004-10-25 22:02 UTC (permalink / raw)
  To: Guy; +Cc: 'David Mansfield', 'Jure Pe_ar', linux-raid

On Mon, Oct 25, 2004 at 04:29:09PM -0400, anybody wrote:
> 1 disk has a bad sector and another disk fails, game over.

On a single-disk, 1 bad sector kills just the one unlucky file.
On a degraded RAID0 or RAID5, 1 bad sector kills the filesystem.
On a healthy RAID0 or RAID5, 2 bad sectors on different disks kill the filesystem.

Does this make sense?
RAID is less fault tolerant than a single disk?

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: raid5, media scans and stripe-wise resync
  2004-10-25 22:02       ` Konstantin Olchanski
@ 2004-10-26  2:34         ` Guy
  0 siblings, 0 replies; 13+ messages in thread
From: Guy @ 2004-10-26  2:34 UTC (permalink / raw)
  To: 'Konstantin Olchanski'
  Cc: 'David Mansfield', 'Jure Pe_ar', linux-raid

I have a cron job that tests each disk once per day.  This really helps.
But I have still had md find a bad sector.  But the risk of having 2 disks
with bad sectors is very low if you test each night.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Konstantin Olchanski
Sent: Monday, October 25, 2004 6:02 PM
To: Guy
Cc: 'David Mansfield'; 'Jure Pe_ar'; linux-raid@vger.kernel.org
Subject: Re: raid5, media scans and stripe-wise resync

On Mon, Oct 25, 2004 at 04:29:09PM -0400, anybody wrote:
> 1 disk has a bad sector and another disk fails, game over.

On a single-disk, 1 bad sector kills just the one unlucky file.
On a degraded RAID0 or RAID5, 1 bad sector kills the filesystem.
On a healthy RAID0 or RAID5, 2 bad sectors on different disks kill the
filesystem.

Does this make sense?
RAID is less fault tolerant than a single disk?

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield
  2004-10-25 17:19 ` Jure Pe_ar
@ 2004-10-25 19:39 ` Bruce Lowekamp
  2004-10-25 19:47   ` David Mansfield
  2004-10-26  9:56   ` berk walker
  1 sibling, 2 replies; 13+ messages in thread
From: Bruce Lowekamp @ 2004-10-25 19:39 UTC (permalink / raw)
  To: David Mansfield; +Cc: linux-raid

There was a recent conversation on this mailing list about
transparently recovering from read errors (essentially just rewriting
the bad stripe and letting the disk handle it), but I think it focused
on Raid 1.  It would be a natural for Raid 5 or 6, but I haven't seen
an experimental patch to do that.

If you just want to monitor, look at http://smartmontools.sourceforge.net
each of the drives in my array has a montoring config:
/dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
lowekamp@cs.wm.edu

two weeks ago I got email that one disk had a bad read on a sector
during its weekly long scan (an entire surface scan).  I failed that
drive manually, waited until it resynced on the spare, overwrote the
entire drive to let the drive clear the sector (and make sure there
weren't any other problems), then reran the test and set that drive as
the spare.

I'd still feel safer if it automatically overwrote only the sector
with the read error, but at least this way I knew that the other 9
drives had passed a surface scan just before, so I wasn't likely to
run into a second read failure on rebuild.

Bruce


On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote:
> Hi everyone,
> 
> After a few recent severe raid failures (one linux md, one 3ware), my
> understanding and fear about linux md is greatly increased.  Single
> sector unrecoverable errors are doing us in!
> 
> To alleviate these fears, we (my coworkers and I) believe we need to
> start a policy of conducting a 'background media scan' of the actual
> underlying physical devices in a raid 5.  This is easily accomplished on
> the 3ware (it's built in), but we are struggling with linux md.
> 
> A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will
> allow us to scan the media, and, if necessary, reassign the bad blocks.
> We have used this on scsi disks before, it seems to work, as a lowlevel
> tool.
> 
> However! If two bad blocks are discovered on two different disks in the
> raid 5 (even if the bad blocks are in different stripes), we will be
> screwed, because the raid system will kick out the disk immediately when
> the first bad sector is found, and then reconstruction will fail when
> the second bad sector is found.  screwed.
> 
> Which brings me (finally) to my questions:
> 
> 1) does linux md have a plan for integrating background media scanning
> and automatic sector reassignment like hardware solutions have?
> 
> 2) how can we force (or manually perform) a stripe-wise resync? is it
> possible to take the raid offline completely, read the data with dd,
> compute the parity manually, reassign the bad block using SCU and
> rewrite the parity block with dd then put the raid online again?
> 
> If #2 is possible, I'm sure a quick-and-dirty perl script could be
> created to do the work, which I'd be happy to do, if it's theoretically
> doable.
> 
> Thanks,
> David
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Bruce Lowekamp  (lowekamp@cs.wm.edu)
Computer Science Dept, College of William and Mary

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 19:39 ` Bruce Lowekamp
@ 2004-10-25 19:47   ` David Mansfield
  2004-10-26  9:56   ` berk walker
  1 sibling, 0 replies; 13+ messages in thread
From: David Mansfield @ 2004-10-25 19:47 UTC (permalink / raw)
  To: Bruce Lowekamp; +Cc: linux-raid

On Mon, 2004-10-25 at 15:39, Bruce Lowekamp wrote:
> There was a recent conversation on this mailing list about
> transparently recovering from read errors (essentially just rewriting
> the bad stripe and letting the disk handle it), but I think it focused
> on Raid 1.  It would be a natural for Raid 5 or 6, but I haven't seen
> an experimental patch to do that.
> 
> If you just want to monitor, look at http://smartmontools.sourceforge.net
> each of the drives in my array has a montoring config:
> /dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
> lowekamp@cs.wm.edu
> 

Thanks for the reference.

> two weeks ago I got email that one disk had a bad read on a sector
> during its weekly long scan (an entire surface scan).  I failed that
> drive manually, waited until it resynced on the spare, overwrote the
> entire drive to let the drive clear the sector (and make sure there
> weren't any other problems), then reran the test and set that drive as
> the spare.
> 

Check out the utility 'scu' at the url: 
http://www.bit-net.com/%7Ermiller/scu.html

It will allow you to 'reassign' the block directly by accessing the scsi
commands.  I've tried the rewrite method you used above, and once or
twice had problems.

> I'd still feel safer if it automatically overwrote only the sector
> with the read error, but at least this way I knew that the other 9
> drives had passed a surface scan just before, so I wasn't likely to
> run into a second read failure on rebuild.
> 

Yeah.  After scanning all disks you are reasonably assured.  But should
it happen that there are two defects, you are completely screwed.  No
way around it, I think.

I'd really like a way to resync a single stripe...

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: raid5, media scans and stripe-wise resync
  2004-10-25 19:39 ` Bruce Lowekamp
  2004-10-25 19:47   ` David Mansfield
@ 2004-10-26  9:56   ` berk walker
  1 sibling, 0 replies; 13+ messages in thread
From: berk walker @ 2004-10-26  9:56 UTC (permalink / raw)
  Cc: linux-raid

One problem with doing a surface scan which writes and reads back the 
data is that in the event of weak/worn media, the data can appear to be 
good, but degrade quickly (mag fields go soft).  Just my own 2 cents, 
but the sick fella should be shot and buried immediately, no second chances.
b-

Bruce Lowekamp wrote:

>There was a recent conversation on this mailing list about
>transparently recovering from read errors (essentially just rewriting
>the bad stripe and letting the disk handle it), but I think it focused
>on Raid 1.  It would be a natural for Raid 5 or 6, but I haven't seen
>an experimental patch to do that.
>
>If you just want to monitor, look at http://smartmontools.sourceforge.net
>each of the drives in my array has a montoring config:
>/dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
>lowekamp@cs.wm.edu
>
>two weeks ago I got email that one disk had a bad read on a sector
>during its weekly long scan (an entire surface scan).  I failed that
>drive manually, waited until it resynced on the spare, overwrote the
>entire drive to let the drive clear the sector (and make sure there
>weren't any other problems), then reran the test and set that drive as
>the spare.
>
>I'd still feel safer if it automatically overwrote only the sector
>with the read error, but at least this way I knew that the other 9
>drives had passed a surface scan just before, so I wasn't likely to
>run into a second read failure on rebuild.
>
>Bruce
>
>
>On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote:
>  
>
>>Hi everyone,
>>
>>After a few recent severe raid failures (one linux md, one 3ware), my
>>understanding and fear about linux md is greatly increased.  Single
>>sector unrecoverable errors are doing us in!
>>
>>To alleviate these fears, we (my coworkers and I) believe we need to
>>start a policy of conducting a 'background media scan' of the actual
>>underlying physical devices in a raid 5.  This is easily accomplished on
>>the 3ware (it's built in), but we are struggling with linux md.
>>
>>A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will
>>allow us to scan the media, and, if necessary, reassign the bad blocks.
>>We have used this on scsi disks before, it seems to work, as a lowlevel
>>tool.
>>
>>However! If two bad blocks are discovered on two different disks in the
>>raid 5 (even if the bad blocks are in different stripes), we will be
>>screwed, because the raid system will kick out the disk immediately when
>>the first bad sector is found, and then reconstruction will fail when
>>the second bad sector is found.  screwed.
>>
>>Which brings me (finally) to my questions:
>>
>>1) does linux md have a plan for integrating background media scanning
>>and automatic sector reassignment like hardware solutions have?
>>
>>2) how can we force (or manually perform) a stripe-wise resync? is it
>>possible to take the raid offline completely, read the data with dd,
>>compute the parity manually, reassign the bad block using SCU and
>>rewrite the parity block with dd then put the raid online again?
>>
>>If #2 is possible, I'm sure a quick-and-dirty perl script could be
>>created to do the work, which I'd be happy to do, if it's theoretically
>>doable.
>>
>>Thanks,
>>David
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>    
>>
>
>
>  
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-10-26  9:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield
2004-10-25 17:19 ` Jure Pe_ar
2004-10-25 19:43   ` David Mansfield
2004-10-25 20:29     ` Guy
2004-10-25 20:35       ` David Mansfield
2004-10-25 20:48         ` Jure Pe_ar
2004-10-25 21:09           ` David Mansfield
2004-10-25 20:56         ` Guy
2004-10-25 22:02       ` Konstantin Olchanski
2004-10-26  2:34         ` Guy
2004-10-25 19:39 ` Bruce Lowekamp
2004-10-25 19:47   ` David Mansfield
2004-10-26  9:56   ` berk walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).