Re: raid and sleeping bad sectors

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Hardy <mhardy@h3c.com>
To: linux-raid@vger.kernel.org
Subject: Re: raid and sleeping bad sectors
Date: Wed, 30 Jun 2004 19:42:22 -0700	[thread overview]
Message-ID: <40E37A0E.90102@h3c.com> (raw)
In-Reply-To: <200407010152.i611qo300317@watkins-home.com>


The last I heard Neil write on the subject (granted I've only been on 
the list a couple weeks) was that it would just require an alteration in 
the code to do the auto-write-bad-block-on-read-error.

To me, that read as "submit a patch, if its good, I'll take it" 
(apologies to Neil if I'm wrong - I'm definitely not qualified to speak 
for him)

I haven't seen anyone disagree with this strategy of trying to force the 
hardware to remap when you have valid redundant data - and I've asked 
several people I know off line.

The "plan b" of software-remapping write errors is probably more 
contentious, but the "plan a" of read error remaps does seem easy and 
non-controversial.

Its just that no one has stepped up with code.

I don't have the time to do it myself but I too run smartd and it does 
long tests for me and occasionally reports errors before md finds them, 
so I'd like the solution too. If anyone does code it up and submit an 
accepted patch, I'd definitely ship a case of beer (or equivalent) their 
direction...

-Mike

Guy wrote:
> "And where do you propose the system would store all the info about
> badblocks?"
> 
> Simple, this is an 8 or 16 bit value per device.  I am sure we could find 16
> bits!  If the device is replaced we don't need the info anymore, so store it
> on the device!  In the superblock maybe?  Once the disk fails it would be
> nice for md to log the current value, just so we know.
> 
> About the disk test.  I do a disk test each night.  That's my point!!!  I
> don't think I should do the test.  If the test fails I need to correct it.
> Let md test things, and correct them, and send an alert if it can't correct
> it, or if a threshold is exceeded!
> 
> Paranoid?  You been using computers long?  I guess not.  In time you will
> learn!  :)  If any block in the stripe gets hosed (parity or not) when you
> replace a disk, during the re-build the constructed data will be wrong, even
> if it was correct on the failed disk.  The corruption now affects 2 disks.
> Yes, I want to verify the parity.  Can be just a utility that gives a
> report.  With RAID5 you can't determine which disk is corrupt!  Only that
> the parity does not match the data.  If the corruption was in the parity,
> re-writing the parity would correct it.  If the corruption is in the data,
> re-writing the parity will prevent spreading the corruption to another disk
> during the next re-build.  With RAID6 I think you could determine which disk
> is corrupt and correct it!
> 
> Neil?  Any thoughts?  You have been silent on this subject.
> 
> Guy
> -------------------------------------------------------------------------
> Spock - "If I drop a wrench on a planet with a positive gravity field, I
> need not see it fall, nor hear it hit the ground, to know that it has in
> fact fallen."
> 
> Guy - "Logic dictates that there are numerous events that could prevent the
> wrench from hitting the ground.  I need to verify the impact!"
> 
> 
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Jure Peèar
> Sent: Wednesday, June 30, 2004 7:27 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: raid and sleeping bad sectors
> 
> On Wed, 30 Jun 2004 18:44:16 -0400
> "Guy" <bugzilla@watkins-home.com> wrote:
> 
> 
>>I want plan "a".  I want the system to correct the bad block by re-writing
>>it!  I want the system to count the number of times blocks have been
>>re-located by the drive.  I want the system to send an alert when a limit
>>has been reached.  This limit should be before the disk runs out of spare
>>blocks.  I want the system to periodically verify all parity data and
>>mirrors.  I want the system to periodically do a surface scan (would be a
>>side effect of verify parity).
> 
> 
> And where do you propose the system would store all the info about
> badblocks?
> 
> I have an old hw raid controller for my alpha box maintains a badblock table
> in its nvram. I guess it's a common feature in hw raid cards, since i had a
> whole box of disks with firmwares that reported each internal badblock
> relocation as scsi hardware error. Needless to say, linux sw raid freaked
> out on each such event. Things were very interesting untill we got firmware
> upgrade for those disks ... 
> Also, at least 3ware cards do a 'nightly maintenance' of disks which i guess
> is something like dd if=/dev/hdX of=/dev/null ... What is holding you back
> to do this with a simple shell script and a cron entry?
> Now for cheching the parity in the raid5/6 setups, some kind of tool would
> be needed ... maybe some extension to mdadm? For the really paranoid people
> out there ... :)
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2004-07-01  2:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-29 10:48 raid and sleeping bad sectors Dieter Stueken
2004-06-29 15:59 ` Guy
2004-06-29 16:30   ` John Lange
2004-06-29 18:43     ` Mike Tran
2004-06-29 20:56       ` Dieter Stueken
2004-06-29 23:45         ` Mike Tran
2004-06-30  2:19           ` Guy
2004-06-30  8:44             ` Dieter Stueken
2004-06-30 21:40             ` Mike Tran
2004-06-30 22:44               ` Guy
2004-06-30 23:27                 ` Jure Peèar
2004-07-01  1:52                   ` Guy
2004-07-01  2:42                     ` Michael Hardy [this message]
2004-06-29 21:51       ` Guy
2004-06-29 22:20         ` Mike Tran
2004-06-29 18:03   ` Guy
2004-06-29 17:37 ` dean gaudet
2004-06-30  6:12 ` Holger Kiehl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40E37A0E.90102@h3c.com \
    --to=mhardy@h3c.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).