linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: Roy Sigurd Karlsbakk <roy@karlsbakk.net>,
	Linux Raid <linux-raid@vger.kernel.org>
Subject: Re: Checksumming RAID?
Date: Tue, 27 Nov 2012 14:05:30 +0100	[thread overview]
Message-ID: <50B4BA9A.4070504@hesbynett.no> (raw)
In-Reply-To: <50B4B2AA.5010809@fastmail.fm>

On 27/11/2012 13:31, Bernd Schubert wrote:
> On 11/27/2012 12:20 PM, David Brown wrote:
>> I can certainly sympathise with you, but I am not sure that data
>> checksumming would help here.  If your hardware raid sends out nonsense,
>> then it is going to be very difficult to get anything trustworthy.  The
>
> When a single hardware unit (any kind of block device) in a
> raid-level > 0 decides to send wrong data, correct data always can be
> reconstructed. You only need to know which unit it is - checksums help
> to figure that out.

If checksums (as described in the paper) only "help" to figure that out, 
then they are not good enough - you can only do automatic on-the-fly 
correction if you are /sure/ you know which device is the problem (at 
least for a very high probability of "sure").  I think that adding an 
extra checksum block to the stripe only gives an indication of the 
problem disk (or lower-level raid) - without being sure of the order 
that data hits the different disks (or lower-level raids), I don't think 
it is reliable enough.  (I could be wrong in all this - I'm just waving 
around ideas, and have no experience with big arrays.)

>
>> obvious answer here is to throw out the broken hardware raid and use a
>> system that works - but it is equally obvious that that is easier said
>> than done!  But I would find it hard to believe that this is a common
>> issue with hardware raid systems - it goes against the whole point of
>> data storage.
>
> With disks it is not that uncommon. But yes, hardware raid controllers
> usually do not scramble data.

With disks it /is/ uncommon.  /Detected/ disk errors are not a problem - 
the disks's own ECC system finds it has an unrecoverable error, and 
returns a read error, and the raid system replaces the data using the 
rest of the stripe.  It is /undetected/ disk errors that are a problem. 
  Typical figures I have seen are around 1 in 1e12 4KB blocks - or 1 in 
3e16 bits.  If you've got a 1 PB disk array, that's one error for every 
four full reads - which is certainly enough to be relevant, but I 
wouldn't say it is "not that uncommon".

>
>>
>> There is always a chance of undetected read errors - the question is if
>> the chances of such read errors, and the consequences of them, justify
>> the costs of extra checking.  And if they /do/ justify extra checking,
>> are data checksums the right way?  I agree with Neil's post that
>> end-to-end checksums (such as CRCs in a gzip file, or GPG integrity
>> checks) are the best check when they are possible, but they are not
>> always possible because they are not transparent.
>
> Everything below block or filesystem level is too late. Just remember,
> writing not a complete stripe implies reads in order to update the p and
> q parity blocks. So even if your application could later on detect that
> (Do your applications usually verify checksums?  In HPC I don't know of
> a single application to do that...), file system meta data already would
> be broken.
>

When you say "below block or filesystem level", I presume you mean such 
as "application level"?  I always think of that as above the filesystem, 
which is above the block level.  I certainly agree that it is often not 
practical to verify checksums at the application level.

As I mentioned in another post, I think there are times when filesystem 
checksumming can make sense.  I also described another idea at block 
level - I am curious as to what you think of that.

mvh.,

David



  reply	other threads:[~2012-11-27 13:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26 13:27 Checksumming RAID? Roy Sigurd Karlsbakk
2012-11-27  9:45 ` David Brown
2012-11-27 10:17   ` Bernd Schubert
2012-11-27 11:20     ` David Brown
2012-11-27 11:39       ` Roy Sigurd Karlsbakk
2012-11-27 12:37         ` David Brown
2012-11-27 13:09           ` Roy Sigurd Karlsbakk
2012-11-27 13:20             ` David Brown
2012-11-27 13:56               ` Roy Sigurd Karlsbakk
2012-11-27 14:34                 ` David Brown
2012-11-27 20:49           ` Stan Hoeppner
2012-11-28 10:58             ` Roy Sigurd Karlsbakk
2012-11-27 12:31       ` Bernd Schubert
2012-11-27 13:05         ` David Brown [this message]
2012-11-27 18:53           ` Chris Murphy
2012-11-27 19:27             ` Roy Sigurd Karlsbakk
2012-11-27 19:50               ` Chris Murphy
2012-11-28 10:56                 ` Roy Sigurd Karlsbakk
2012-11-28 10:59                   ` Roy Sigurd Karlsbakk
2012-11-28 13:25                   ` Drew
2012-11-28 17:51                     ` Roy Sigurd Karlsbakk
2012-11-28 19:16                       ` Chris Murphy
2012-11-28 19:08                   ` Chris Murphy
2012-11-28 19:18                     ` Roy Sigurd Karlsbakk
2012-11-28 20:02                       ` Chris Murphy
2012-11-27 13:54       ` Joe Landman
2012-11-27 18:48   ` Chris Murphy
2012-11-27 19:36     ` Chris Murphy
2012-12-03 12:24 ` Pasi Kärkkäinen
2012-12-03 14:09   ` Checksumming RAID? / SCSI SAS T10 PI and DIF/DIX / T13 SATA EPP Pasi Kärkkäinen
2012-12-05 19:05     ` Martin K. Petersen
2012-12-06 11:10       ` John Robinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50B4BA9A.4070504@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=bernd.schubert@fastmail.fm \
    --cc=linux-raid@vger.kernel.org \
    --cc=roy@karlsbakk.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).