linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: berk walker <berk.walker@verizon.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5, media scans and stripe-wise resync
Date: Tue, 26 Oct 2004 05:56:42 -0400	[thread overview]
Message-ID: <417E1F5A.5020902@verizon.net> (raw)
In-Reply-To: <e9132f82041025123926140ece@mail.gmail.com>

One problem with doing a surface scan which writes and reads back the 
data is that in the event of weak/worn media, the data can appear to be 
good, but degrade quickly (mag fields go soft).  Just my own 2 cents, 
but the sick fella should be shot and buried immediately, no second chances.
b-

Bruce Lowekamp wrote:

>There was a recent conversation on this mailing list about
>transparently recovering from read errors (essentially just rewriting
>the bad stripe and letting the disk handle it), but I think it focused
>on Raid 1.  It would be a natural for Raid 5 or 6, but I haven't seen
>an experimental patch to do that.
>
>If you just want to monitor, look at http://smartmontools.sourceforge.net
>each of the drives in my array has a montoring config:
>/dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
>lowekamp@cs.wm.edu
>
>two weeks ago I got email that one disk had a bad read on a sector
>during its weekly long scan (an entire surface scan).  I failed that
>drive manually, waited until it resynced on the spare, overwrote the
>entire drive to let the drive clear the sector (and make sure there
>weren't any other problems), then reran the test and set that drive as
>the spare.
>
>I'd still feel safer if it automatically overwrote only the sector
>with the read error, but at least this way I knew that the other 9
>drives had passed a surface scan just before, so I wasn't likely to
>run into a second read failure on rebuild.
>
>Bruce
>
>
>On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote:
>  
>
>>Hi everyone,
>>
>>After a few recent severe raid failures (one linux md, one 3ware), my
>>understanding and fear about linux md is greatly increased.  Single
>>sector unrecoverable errors are doing us in!
>>
>>To alleviate these fears, we (my coworkers and I) believe we need to
>>start a policy of conducting a 'background media scan' of the actual
>>underlying physical devices in a raid 5.  This is easily accomplished on
>>the 3ware (it's built in), but we are struggling with linux md.
>>
>>A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will
>>allow us to scan the media, and, if necessary, reassign the bad blocks.
>>We have used this on scsi disks before, it seems to work, as a lowlevel
>>tool.
>>
>>However! If two bad blocks are discovered on two different disks in the
>>raid 5 (even if the bad blocks are in different stripes), we will be
>>screwed, because the raid system will kick out the disk immediately when
>>the first bad sector is found, and then reconstruction will fail when
>>the second bad sector is found.  screwed.
>>
>>Which brings me (finally) to my questions:
>>
>>1) does linux md have a plan for integrating background media scanning
>>and automatic sector reassignment like hardware solutions have?
>>
>>2) how can we force (or manually perform) a stripe-wise resync? is it
>>possible to take the raid offline completely, read the data with dd,
>>compute the parity manually, reassign the bad block using SCU and
>>rewrite the parity block with dd then put the raid online again?
>>
>>If #2 is possible, I'm sure a quick-and-dirty perl script could be
>>created to do the work, which I'd be happy to do, if it's theoretically
>>doable.
>>
>>Thanks,
>>David
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>    
>>
>
>
>  
>


      parent reply	other threads:[~2004-10-26  9:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield
2004-10-25 17:19 ` Jure Pe_ar
2004-10-25 19:43   ` David Mansfield
2004-10-25 20:29     ` Guy
2004-10-25 20:35       ` David Mansfield
2004-10-25 20:48         ` Jure Pe_ar
2004-10-25 21:09           ` David Mansfield
2004-10-25 20:56         ` Guy
2004-10-25 22:02       ` Konstantin Olchanski
2004-10-26  2:34         ` Guy
2004-10-25 19:39 ` Bruce Lowekamp
2004-10-25 19:47   ` David Mansfield
2004-10-26  9:56   ` berk walker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=417E1F5A.5020902@verizon.net \
    --to=berk.walker@verizon.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).