linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: Theodore Tso <tytso@MIT.EDU>,
	Peter Rabbitson <rabbit+list@rabbit.us>,
	Bas van Schaik <bas@tuxes.nl>,
	linux-raid@vger.kernel.org
Subject: Re: Redundancy check using "echo check > sync_action": error	reporting?
Date: Tue, 25 Mar 2008 11:17:36 -0400	[thread overview]
Message-ID: <47E91790.2040101@tmr.com> (raw)
In-Reply-To: <18408.33596.390197.465267@notabene.brown>

Neil Brown wrote:
> On Saturday March 22, tytso@MIT.EDU wrote:
>   
>> On Fri, Mar 21, 2008 at 06:35:43PM +0100, Peter Rabbitson wrote:
>>     
>>> Of course it would be possible to instruct md to always read all 
>>> data+parity chunks and make a comparison on every read. The performance 
>>> would not be much to write home about though.
>>>       
>> Yeah, and that's probably the real problem with this scheme.  You
>> basically reduce the read bandwidth of your array down to a single
>> (slowest) disk --- basically the same reason why RAID-2 is a
>> commercial failure.  
>>     
>
> Exactly.
>
>   
In some cases that would be acceptable. Obviously in the general case 
it's not required.
>> I suspect the best thing we *can* to do is for filesystems that
>> include checksums in the metadata and/or the data blocks, is if the
>> CRC doesn't match, to have the filesystem tell the RAID subsystem,
>> "um, could you send me copies of the data from all of the RAID-1
>> mirrors, and see if one of the copies from the mirrors causes a valid
>> checksum".  Something similar could be done with RAID-5/RAID-6 arrays,
>> if the fs layer could ask the RAID subsystem, "the external checksum
>> for this block is bad; can you recalculate it from all available
>> parity stripes assuming the data stripe is invalid".
>>     
>
> Something along these lines would be very appropriate I think.
> Particularly for raid1.
> For raid5/raid6 it is possible that a valid block in the same stripe
> was read and written before the faulty block was read.  This would
> correct the parity so when the bad block was found, there would be no
> way to recover the correct data.
> Still, having the possibility of recovery might be better than not
> having it.
>
>   
>> As far as the question of how often this happens, where a disk
>> silently corrupts a block without returning a media error, it
>> definitely happens.  Larry McVoy tells a story of periodically running
>> a per-file CRC across a backup/archival filesystems, and was able to
>> detect files that had not been modified changing out from under him.
>> One way this can happen is if the disk accidentally writes some block
>> to the wrong location on disk; the blockguard extension and various
>> enterprise databases (since they can control their db-specific on-disk
>> format) will encode the intended location of a block in their
>> per-block checksums, to detect this specific type of failure, which
>> should broad hint that this sort of thing can and does happen.
>>     
>
> The "address data was corrupted" is certainly a credible possibility.
> I remember reading that SCSI has a parity check for data, but not for
> the command, which include the storage address.
>
> With the raid6 algorithm, we can tell which device has an error
> (assuming only one device does) for each byte in the block.
> If this returns the same device for every block in a sector, it is
> probably reasonable to assume that exactly that block is bad.
> Still, if we only do that on the monthly 'check', it could be too
> late.
>
>   
I think the old saying "better late than never" applies, once the user 
knows that there is a problem via 'check,' and fixes it if possible, 
some form of recovery would then at least be possible.

> I'm not sure that "surviving some data corruptions, if you are lucky"
> is really better than surviving none.  We don't want to provide a
> false sense of security.... but maybe RAID already does that.
>
> A filesystem that always writes full stripes and never over-writes
> valid data.  And that (optionally) stores checksums for everything is
> looking more an more appealing.   The trouble is, I don't seem to have
> enough "spare time" :-)
>   

Frankly I think your limited time is better spent on raid, there are 
undoubtedly plenty of things on your "to do" list. I'd like to hope that 
raid5e is at least on that list, but I would be the first to say that 
performance improvements for raid5 would benefit more people.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



  reply	other threads:[~2008-03-25 15:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-16 14:21 Redundancy check using "echo check > sync_action": error reporting? Bas van Schaik
2008-03-16 15:14 ` Janek Kozicki
2008-03-20 13:32   ` Bas van Schaik
2008-03-20 13:47     ` Robin Hill
2008-03-20 14:19       ` Bas van Schaik
2008-03-20 14:45         ` Robin Hill
2008-03-20 15:16           ` Bas van Schaik
2008-03-20 16:04             ` Robin Hill
2008-03-20 16:35         ` Theodore Tso
2008-03-20 17:10           ` Robin Hill
2008-03-20 17:39           ` Andre Noll
2008-03-20 18:02             ` Theodore Tso
2008-03-20 18:57               ` Andre Noll
2008-03-21 14:02               ` Ric Wheeler
2008-03-21 20:19               ` NeilBrown
2008-03-21 20:45                 ` Ric Wheeler
2008-03-22 17:13                 ` Bill Davidsen
2008-03-20 23:08           ` Peter Rabbitson
2008-03-21 14:24             ` Bill Davidsen
2008-03-21 14:52               ` Peter Rabbitson
2008-03-21 17:13                 ` Theodore Tso
2008-03-21 17:35                   ` Peter Rabbitson
2008-03-22 13:27                     ` Theodore Tso
2008-03-22 14:00                       ` Bas van Schaik
2008-03-25  4:44                       ` Neil Brown
2008-03-25 15:17                         ` Bill Davidsen [this message]
2008-03-25  9:19                       ` Mattias Wadenstein
2008-03-21 17:43                   ` Robin Hill
2008-03-21 23:01                 ` Bill Davidsen
2008-03-21 23:45                   ` Carlos Carvalho
2008-03-22 17:19                     ` Bill Davidsen
2008-03-21 23:55                   ` Robin Hill
2008-03-22 10:03                     ` Peter Rabbitson
2008-03-22 10:42                       ` What do Events actually mean? Justin Piszcz
2008-03-22 17:35                         ` David Greaves
2008-03-22 17:48                           ` Justin Piszcz
2008-03-22 18:02                             ` David Greaves
2008-03-25  3:58                         ` Neil Brown
2008-03-26  8:57                           ` David Greaves
2008-03-26  8:57                           ` David Greaves
2008-05-04  7:30                       ` Redundancy check using "echo check > sync_action": error reporting? Peter Rabbitson
2008-05-06  6:36                         ` Luca Berra
2008-03-25  4:24             ` Neil Brown
2008-03-25  9:00               ` Peter Rabbitson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47E91790.2040101@tmr.com \
    --to=davidsen@tmr.com \
    --cc=bas@tuxes.nl \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rabbit+list@rabbit.us \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).