public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs and ECC RAM
Date: Mon, 20 Jan 2014 16:13:18 +0000 (UTC)	[thread overview]
Message-ID: <pan$36f1b$8300c6a3$5960d52c$7e050399@cox.net> (raw)
In-Reply-To: 96091903-E1B0-4455-9EDC-EF94EE2E5110@aei.mpg.de

Ian Hinder posted on Mon, 20 Jan 2014 15:57:42 +0100 as excerpted:

> In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449,
> they talk about reconstructing corrupted data from parity information:
> 
>> Ok, no problem. ZFS will check against its parity. Oops, the parity
>> failed since we have a new corrupted bit. Remember, the checksum data
>> was calculated after the corruption from the first memory error
>> occurred. So now the parity data is used to "repair" the bad data. So
>> the data is "fixed" in RAM.
> 
> i.e. that there is parity information stored with every piece of data,
> and ZFS will "correct" errors automatically from the parity information.
> I start to suspect that there is confusion here between checksumming
> for data integrity and parity information.  If this is really how ZFS
> works, then if memory corruption interferes with this process, then I
> can see how a scrub could be devastating.  I don't know if ZFS really
> works like this.  It sounds very odd to do this without an additional
> checksum check.

Good point on the difference between parity and checksumming.

I've absolutely no confirmation of this, but privately I've begun to 
wonder if this difference has anything at all to do with the delay in 
getting "complete" raid5/6 support in btrfs, including scrub; if once 
they actually started working with it, they realized that the traditional 
parity solution of raid5/6 didn't end up working out so well with 
checksumming, such that in an ungraceful shutdown/crash situation, if the 
crash happened at just the wrong point, if the parity and checksumming 
would actually fight each other, such that restoring one (presumably 
parity, since it'd be the lower level, closer to the metal) triggered a 
failure of the other (presumably the checksumming, existing above the 
parity).

That could trigger all sorts of issues that I suppose to be solvable in 
theory, but said theory is well beyond me, and could well invite complex 
coding issues that are incredibly difficult to resolve in a satisfactory 
way, thus the hiccup in getting /complete/ btrfs raid5/6 support, even 
when the basic parity calculation and write-out as been in-kernel for 
several kernel cycles already, and was available as patches from well 
before that.

If that's correct (and again, I've absolutely nothing but the delay and 
personal intuition to back it up, and the delay in itself means little, 
as it seems most btrfs features have taken longer to complete than 
originally planned, such that btrfs as a whole is now years behind the 
originally it turned out wildly optimistic plan, meaning it's simply 
personal intuition, and as I'm not a dev that should mean approximately 
nothing to anyone else, so take it for what it's worth...), then 
ultimately we may end up with btrfs raid5/6 modes that end up being 
declared usable, but that come with lower integrity and checksumming 
guarantees (particularly across device failure and replace) than those 
that normally apply to btrfs in other configurations.  At least for the 
btrfs initially considered stable.  Perhaps down the road a few years a 
more advanced btrfs raid5/6 implementation, with better integrity/
checksumming guarantees, would become available.

Perhaps zfs has a similar parity mode, as opposed to real checksumming, 
but has real checksumming in other modes?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2014-01-20 16:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
2014-01-18  0:49 ` cwillu
2014-01-18  1:10 ` George Mitchell
2014-01-18  7:16 ` Duncan
2014-01-19 19:02   ` Martin Steigerwald
2014-01-19 20:20     ` George Mitchell
2014-01-19 20:54       ` Duncan
2014-01-24 23:57       ` Russell Coker
2014-01-25  4:34         ` Duncan
2014-01-19 21:32     ` Duncan
2014-01-20  0:17 ` George Eleftheriou
2014-01-20  3:13   ` Austin S Hemmelgarn
2014-01-20 14:57     ` Ian Hinder
2014-01-20 15:36       ` Bob Marley
2014-01-20 16:04         ` Austin S Hemmelgarn
2014-01-20 16:08         ` George Mitchell
2014-01-25  0:45           ` Chris Murphy
2014-01-27 16:08             ` Calvin Walton
2014-01-27 16:42               ` Chris Murphy
2014-01-20 16:13       ` Duncan [this message]
2014-01-20 15:55     ` Fajar A. Nugraha
2014-01-23 16:00   ` David Sterba
  -- strict thread matches above, loose matches on Subject: below --
2014-01-20 15:27 Ian Hinder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$36f1b$8300c6a3$5960d52c$7e050399@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox