From: berk walker <berk.walker@verizon.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5, media scans and stripe-wise resync
Date: Tue, 26 Oct 2004 05:56:42 -0400 [thread overview]
Message-ID: <417E1F5A.5020902@verizon.net> (raw)
In-Reply-To: <e9132f82041025123926140ece@mail.gmail.com>
One problem with doing a surface scan which writes and reads back the
data is that in the event of weak/worn media, the data can appear to be
good, but degrade quickly (mag fields go soft). Just my own 2 cents,
but the sick fella should be shot and buried immediately, no second chances.
b-
Bruce Lowekamp wrote:
>There was a recent conversation on this mailing list about
>transparently recovering from read errors (essentially just rewriting
>the bad stripe and letting the disk handle it), but I think it focused
>on Raid 1. It would be a natural for Raid 5 or 6, but I haven't seen
>an experimental patch to do that.
>
>If you just want to monitor, look at http://smartmontools.sourceforge.net
>each of the drives in my array has a montoring config:
>/dev/hda -a -o on -S on -R 194 -s (S/../.././02|L/../../6/07) -m
>lowekamp@cs.wm.edu
>
>two weeks ago I got email that one disk had a bad read on a sector
>during its weekly long scan (an entire surface scan). I failed that
>drive manually, waited until it resynced on the spare, overwrote the
>entire drive to let the drive clear the sector (and make sure there
>weren't any other problems), then reran the test and set that drive as
>the spare.
>
>I'd still feel safer if it automatically overwrote only the sector
>with the read error, but at least this way I knew that the other 9
>drives had passed a surface scan just before, so I wasn't likely to
>run into a second read failure on rebuild.
>
>Bruce
>
>
>On Mon, 25 Oct 2004 11:36:33 -0400, David Mansfield <md@dm.cobite.com> wrote:
>
>
>>Hi everyone,
>>
>>After a few recent severe raid failures (one linux md, one 3ware), my
>>understanding and fear about linux md is greatly increased. Single
>>sector unrecoverable errors are doing us in!
>>
>>To alleviate these fears, we (my coworkers and I) believe we need to
>>start a policy of conducting a 'background media scan' of the actual
>>underlying physical devices in a raid 5. This is easily accomplished on
>>the 3ware (it's built in), but we are struggling with linux md.
>>
>>A utility called SCU, http://www.bit-net.com/%7Ermiller/scu.html, will
>>allow us to scan the media, and, if necessary, reassign the bad blocks.
>>We have used this on scsi disks before, it seems to work, as a lowlevel
>>tool.
>>
>>However! If two bad blocks are discovered on two different disks in the
>>raid 5 (even if the bad blocks are in different stripes), we will be
>>screwed, because the raid system will kick out the disk immediately when
>>the first bad sector is found, and then reconstruction will fail when
>>the second bad sector is found. screwed.
>>
>>Which brings me (finally) to my questions:
>>
>>1) does linux md have a plan for integrating background media scanning
>>and automatic sector reassignment like hardware solutions have?
>>
>>2) how can we force (or manually perform) a stripe-wise resync? is it
>>possible to take the raid offline completely, read the data with dd,
>>compute the parity manually, reassign the bad block using SCU and
>>rewrite the parity block with dd then put the raid online again?
>>
>>If #2 is possible, I'm sure a quick-and-dirty perl script could be
>>created to do the work, which I'd be happy to do, if it's theoretically
>>doable.
>>
>>Thanks,
>>David
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
>
>
>
prev parent reply other threads:[~2004-10-26 9:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-25 15:36 raid5, media scans and stripe-wise resync David Mansfield
2004-10-25 17:19 ` Jure Pe_ar
2004-10-25 19:43 ` David Mansfield
2004-10-25 20:29 ` Guy
2004-10-25 20:35 ` David Mansfield
2004-10-25 20:48 ` Jure Pe_ar
2004-10-25 21:09 ` David Mansfield
2004-10-25 20:56 ` Guy
2004-10-25 22:02 ` Konstantin Olchanski
2004-10-26 2:34 ` Guy
2004-10-25 19:39 ` Bruce Lowekamp
2004-10-25 19:47 ` David Mansfield
2004-10-26 9:56 ` berk walker [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=417E1F5A.5020902@verizon.net \
--to=berk.walker@verizon.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).