btrfs and ECC RAM

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* btrfs and ECC RAM
@ 2014-01-18  0:23 Ian Hinder
  2014-01-18  0:49 ` cwillu
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Ian Hinder @ 2014-01-18  0:23 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have been reading a lot of articles online about the dangers of using ZFS with non-ECC RAM.  Specifically, the fact that when good data is read from disk and compared with its checksum, a RAM error can cause the read data to be incorrect, causing a checksum failure, and the bad data might now be written back to the disk in an attempt to correct it, corrupting it in the process.  This would be exacerbated by a scrub, which could run through all your data and potentially corrupt it.  There is a strong current of opinion that using ZFS without ECC RAM is "suicide for your data".

I have been unable to find any discussion of the extent to which this is true for btrfs.  Does btrfs handle checksum errors in the same way as ZFS, or does it perform additional checks before writing "corrected" data back to disk?  For example, if it detects a checksum error, it could read the data again to a different memory location to determine if the error existed in the disk copy or the memory.

>From what I've been reading, it sounds like ZFS should not be used with non-ECC RAM.  This is reasonable, as ZFS' resource requirements mean that you probably only want to run it on server-grade hardware anyway.  But with btrfs eventually being the default filesystem for Linux, that would mean that all linux machines, even cheap consumer-grade hardware, would need ECC RAM, or forego many of the advantages of btrfs.

What is the situation?

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
@ 2014-01-18  0:49 ` cwillu
  2014-01-18  1:10 ` George Mitchell
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: cwillu @ 2014-01-18  0:49 UTC (permalink / raw)
  To: Ian Hinder; +Cc: linux-btrfs

On Fri, Jan 17, 2014 at 6:23 PM, Ian Hinder <ian.hinder@aei.mpg.de> wrote:
> Hi,
>
> I have been reading a lot of articles online about the dangers of using ZFS with non-ECC RAM.  Specifically, the fact that when good data is read from disk and compared with its checksum, a RAM error can cause the read data to be incorrect, causing a checksum failure, and the bad data might now be written back to the disk in an attempt to correct it, corrupting it in the process.  This would be exacerbated by a scrub, which could run through all your data and potentially corrupt it.  There is a strong current of opinion that using ZFS without ECC RAM is "suicide for your data".

That sounds entirely silly:  a scrub will only write data to the disk
that has actually passed a checksum. In order for that to corrupt
something on disk, you'd have to have a perfect storm of correct and
corrupt reads, and in every such case thta I can think of, you'd be
worse off without checksums than if you had them.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
  2014-01-18  0:49 ` cwillu
@ 2014-01-18  1:10 ` George Mitchell
  2014-01-18  7:16 ` Duncan
  2014-01-20  0:17 ` George Eleftheriou
  3 siblings, 0 replies; 23+ messages in thread
From: George Mitchell @ 2014-01-18  1:10 UTC (permalink / raw)
  To: linux-btrfs

On 01/17/2014 04:23 PM, Ian Hinder wrote:
> Hi,
>
> I have been reading a lot of articles online about the dangers of using ZFS with non-ECC RAM.  Specifically, the fact that when good data is read from disk and compared with its checksum, a RAM error can cause the read data to be incorrect, causing a checksum failure, and the bad data might now be written back to the disk in an attempt to correct it, corrupting it in the process.  This would be exacerbated by a scrub, which could run through all your data and potentially corrupt it.  There is a strong current of opinion that using ZFS without ECC RAM is "suicide for your data".
>
> I have been unable to find any discussion of the extent to which this is true for btrfs.  Does btrfs handle checksum errors in the same way as ZFS, or does it perform additional checks before writing "corrected" data back to disk?  For example, if it detects a checksum error, it could read the data again to a different memory location to determine if the error existed in the disk copy or the memory.
>
> >From what I've been reading, it sounds like ZFS should not be used with non-ECC RAM.  This is reasonable, as ZFS' resource requirements mean that you probably only want to run it on server-grade hardware anyway.  But with btrfs eventually being the default filesystem for Linux, that would mean that all linux machines, even cheap consumer-grade hardware, would need ECC RAM, or forego many of the advantages of btrfs.
>
> What is the situation?
>
This subject is already being discussed on this list under the title of 
"drawbacks of non-ECC RAM".  So I suggest you follow that thread back 
and THEN come back with any further questions.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
  2014-01-18  0:49 ` cwillu
  2014-01-18  1:10 ` George Mitchell
@ 2014-01-18  7:16 ` Duncan
  2014-01-19 19:02   ` Martin Steigerwald
  2014-01-20  0:17 ` George Eleftheriou
  3 siblings, 1 reply; 23+ messages in thread
From: Duncan @ 2014-01-18  7:16 UTC (permalink / raw)
  To: linux-btrfs

Ian Hinder posted on Sat, 18 Jan 2014 01:23:41 +0100 as excerpted:

> I have been reading a lot of articles online about the dangers of using
> ZFS with non-ECC RAM.  Specifically, the fact that when good data is
> read from disk and compared with its checksum, a RAM error can cause the
> read data to be incorrect, causing a checksum failure, and the bad data
> might now be written back to the disk in an attempt to correct it,
> corrupting it in the process.  This would be exacerbated by a scrub,
> which could run through all your data and potentially corrupt it.  There
> is a strong current of opinion that using ZFS without ECC RAM is
> "suicide for your data".
> 
> I have been unable to find any discussion of the extent to which this is
> true for btrfs.  Does btrfs handle checksum errors in the same way as
> ZFS, or does it perform additional checks before writing "corrected"
> data back to disk?  For example, if it detects a checksum error, it
> could read the data again to a different memory location to determine
> if the error existed in the disk copy or the memory.

Given the license issues around zfs and linux, zfs is a non-starter for 
me here, and as a result I've never looked particularly closely at how it 
works, so I can't really say what it does with checksums or how that 
compares to btrfs.

I /can/ however say that btrfs does /not/ work the way described above, 
however.

When reading data from disk, btrfs will check the checksum.  If it shows 
up as bad and btrfs has another copy of the data available (as it will in 
dup, raid1 or raid10 mode, but not in single or raid0 mode, I'm not 
actually sure how the newer and still not fully complete raid5 and raid6 
modes work in that regard), btrfs will read the other copy and see if 
that matches the checksum.  If it does, the good copy is used and the bad 
copy is rewritten.  If no good copy exists, btrfs fails the read.

So while I don't know how zfs works and whether your scenario of 
rewriting bad data due to checksum failure could happen there or not, it 
can't happen with btrfs, because btrfs will only rewrite the data if it 
has another copy that matches the checksum.  Otherwise it (normally) 
fails the read entirely.  

It is possible to turn off btrfs checksumming entirely with a mount 
option, or to turn off both COW and checksumming on an individual file 
using xattributes, but that's definitely not recommended in general (tho 
it is on specific types of files, generally large internal-write files 
that otherwise end up hugely fragmented due to COW).

As George Mitchell mentions in his followup, there's another thread 
discussing ECC memory and btrfs already.  However, the OP in that thread 
didn't explain the alleged problem with zfs (which again, I've no idea 
whether it's true or not, since due to the licensing issues zfs is a flat 
non-starter for me so I've never looked into it that closely) in that 
regard, so all we were able to say was that ECC and btrfs aren't related 
in that way.  At least here you explained a bit about the alleged 
problem, so we can say for sure that btrfs doesn't work that way.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-18  7:16 ` Duncan
@ 2014-01-19 19:02   ` Martin Steigerwald
  2014-01-19 20:20     ` George Mitchell
  2014-01-19 21:32     ` Duncan
  0 siblings, 2 replies; 23+ messages in thread
From: Martin Steigerwald @ 2014-01-19 19:02 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Am Samstag, 18. Januar 2014, 07:16:42 schrieb Duncan:
> Ian Hinder posted on Sat, 18 Jan 2014 01:23:41 +0100 as excerpted:
> > I have been reading a lot of articles online about the dangers of using
> > ZFS with non-ECC RAM.  Specifically, the fact that when good data is
> > read from disk and compared with its checksum, a RAM error can cause the
> > read data to be incorrect, causing a checksum failure, and the bad data
> > might now be written back to the disk in an attempt to correct it,
> > corrupting it in the process.  This would be exacerbated by a scrub,
> > which could run through all your data and potentially corrupt it.  There
> > is a strong current of opinion that using ZFS without ECC RAM is
> > "suicide for your data".
> > 
> > I have been unable to find any discussion of the extent to which this is
> > true for btrfs.  Does btrfs handle checksum errors in the same way as
> > ZFS, or does it perform additional checks before writing "corrected"
> > data back to disk?  For example, if it detects a checksum error, it
> > could read the data again to a different memory location to determine
> > if the error existed in the disk copy or the memory.
> 
> Given the license issues around zfs and linux, zfs is a non-starter for
> me here, and as a result I've never looked particularly closely at how it
> works, so I can't really say what it does with checksums or how that
> compares to btrfs.
> 
> I /can/ however say that btrfs does /not/ work the way described above,
> however.
> 
> When reading data from disk, btrfs will check the checksum.  If it shows
> up as bad and btrfs has another copy of the data available (as it will in
> dup, raid1 or raid10 mode, but not in single or raid0 mode, I'm not
> actually sure how the newer and still not fully complete raid5 and raid6
> modes work in that regard), btrfs will read the other copy and see if
> that matches the checksum.  If it does, the good copy is used and the bad
> copy is rewritten.  If no good copy exists, btrfs fails the read.
> 
> So while I don't know how zfs works and whether your scenario of
> rewriting bad data due to checksum failure could happen there or not, it
> can't happen with btrfs, because btrfs will only rewrite the data if it
> has another copy that matches the checksum.  Otherwise it (normally)
> fails the read entirely.

I think Ian refers to the slight chance that BTRFS assumes the checksum on one 
disk to be incorrect due to a memory error *and* on another disk to be correct 
due to another memory error *and* will silently rewrite the incorrect data to 
the correct data.

AFAIK BTRFS still does not correct such errors automatically, but only on a 
scrub. There this *could* happen theoretically.

My gut feeling is, that this is highly, highly unlikely.

At least not more likely than a controller writing out garbage or other such 
hardware issues.

And for hardware issues there are backups.

I´d probably like if all computers had ECC RAM, but then I heard more than 
once that ECC doesn´t even detect all possible memory errors.

Maybe at one point the kernel will be able to checksum memory pages itself?

Actually I only once had a memory error in a machine which went completely 
undetected under Windows XP, but let Debian and Ubuntu installers segfault at 
random places. This was years ago. I have never notices a memory error since 
then and I am not aware of any co-workers having had memory errors on their 
laptops. But then… those are usually enterprise grade laptops, which to my 
knowledge nonetheless just use RAM without ECC. I don´t think that this 
ThinkPad T520 uses ECC RAM.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-19 19:02   ` Martin Steigerwald
@ 2014-01-19 20:20     ` George Mitchell
  2014-01-19 20:54       ` Duncan
  2014-01-24 23:57       ` Russell Coker
  2014-01-19 21:32     ` Duncan
  1 sibling, 2 replies; 23+ messages in thread
From: George Mitchell @ 2014-01-19 20:20 UTC (permalink / raw)
  Cc: linux-btrfs

Just my opinion, of course, but I simply cannot imagine how "an 
incorrect checksum could appear correct due to a memory error". Sorry, 
but I just cannot get my brain around that one.  The odds against it 
happening would be beyond comprehension.  I can easily imagine btrfs 
taking a system down due to memory error, but not btrfs causing data 
corruption due to a memory error.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-19 20:20     ` George Mitchell
@ 2014-01-19 20:54       ` Duncan
  2014-01-24 23:57       ` Russell Coker
  1 sibling, 0 replies; 23+ messages in thread
From: Duncan @ 2014-01-19 20:54 UTC (permalink / raw)
  To: linux-btrfs

George Mitchell posted on Sun, 19 Jan 2014 12:20:22 -0800 as excerpted:

> Just my opinion, of course, but I simply cannot imagine how "an
> incorrect checksum could appear correct due to a memory error". Sorry,
> but I just cannot get my brain around that one.  The odds against it
> happening would be beyond comprehension.  I can easily imagine btrfs
> taking a system down due to memory error, but not btrfs causing data
> corruption due to a memory error.

What he said. =:^)

Seriously, the odds must be on the scale of hitting the lottery, if not 
further out.  And from what I've read, people have a better chance at 
getting hit by lightening than hitting the lottery.

So, umm... "y'all" draggin' around a lightening rod, with an insulated 
handle of course and keeping it touching earth at all times, just in case?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-19 20:20     ` George Mitchell
  2014-01-19 20:54       ` Duncan
@ 2014-01-24 23:57       ` Russell Coker
  2014-01-25  4:34         ` Duncan
  1 sibling, 1 reply; 23+ messages in thread
From: Russell Coker @ 2014-01-24 23:57 UTC (permalink / raw)
  To: george; +Cc: linux-btrfs

On Sun, 19 Jan 2014 12:20:22 George Mitchell wrote:
> I can easily imagine btrfs 
> taking a system down due to memory error, but not btrfs causing data 
> corruption due to a memory error.

I had a system which had apparently worked OK on Ext4 but had some memory 
errors.  After twice having a serious BTRFS corruption (needed backup-format-
restore) I ran memtest and found that a DIMM was broken.

In that sort of situation it seems that BTRFS is likely to be more fragile due 
to the more complex data structures and due to the fact that it's a newer 
filesystem with less corner cases handled.

That said, it's working well for me on a few systems without ECC RAM.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-24 23:57       ` Russell Coker
@ 2014-01-25  4:34         ` Duncan
  0 siblings, 0 replies; 23+ messages in thread
From: Duncan @ 2014-01-25  4:34 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Sat, 25 Jan 2014 10:57:43 +1100 as excerpted:

> On Sun, 19 Jan 2014 12:20:22 George Mitchell wrote:
>> I can easily imagine btrfs taking a system down due to memory error,
>> but not btrfs causing data corruption due to a memory error.
> 
> I had a system which had apparently worked OK on Ext4 but had some
> memory errors.  After twice having a serious BTRFS corruption (needed
> backup-format- restore) I ran memtest and found that a DIMM was broken.
> 
> In that sort of situation it seems that BTRFS is likely to be more
> fragile due to the more complex data structures and due to the fact that
> it's a newer filesystem with less corner cases handled.

I believe that was his point (as it has been mine, elsewhere).  If there 
are memory errors, btrfs might well "take down the system", or as you 
mention, corrupt the btrfs such that it needs blown away with a mkfs and 
restore from backup, but it should NOT invisibly corrupt the data.  If 
the data is available, it should be correct as it was stored, memory 
issue or no memory issue, ECC DRAM or non-ECC DRAM.  (Tho there's a small 
chance that the data was corrupted in memory before it was stored, but 
that's the case no matter the filesystem, as the filesystem has nothing 
to do with it then.)

IOW, btrfs is binary.  The data is either there and valid, or its not 
there, period.  There's no there but not valid case -- btrfs simply won't 
serve the data period if it's invalid and there's no valid copy available 
to substitute for it.

And yes, that does make btrfs more brittle on defective equipment, 
because it'll break all the way much more frequently than filesystems 
that would instead serve only slightly invalid data, without knowing how 
valid it is because they don't do the checks btrfs does.

*CAVEAT!  This assumes the filesystem isn't mounted with nodatacow/
nodatasum, and that we're not talking about files with the NOCOW extended 
attribute set, thus specific-case disabling these important features of 
btrfs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-19 19:02   ` Martin Steigerwald
  2014-01-19 20:20     ` George Mitchell
@ 2014-01-19 21:32     ` Duncan
  1 sibling, 0 replies; 23+ messages in thread
From: Duncan @ 2014-01-19 21:32 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Sun, 19 Jan 2014 20:02:41 +0100 as excerpted:

> I´d probably like if all computers had ECC RAM, but then I heard more
> than once that ECC doesn´t even detect all possible memory errors.

Heh, don't I know it!  I had an original generation dual socket, 3-digit 
AMD Opteron system for many years (it finally bit the dust, bulging/
exploded capacitors, a bit over a year ago).  As all Opterons of that era 
it took registered/ecc RAM, but for a couple years I was running PC3200 
rated RAM that Just. Couldn't'. Quite. Stably. Take. its clock-rating.

At some point a BIOS upgrade gave me the ability to de-clock it slightly, 
and then it was "stable as stone", even with wait-cycles decreased a bit 
to partially make up for the lower clocking.  And later I upgraded memory 
and the new memory was fine at rated clock.

But it took me a long time to figure out what was wrong, because I 
couldn't imagine it was the memory due to the ECC, and memtest86+ passed 
it with flying colors as well.  But I guess memtest86+ checks memory cell 
stability but not really clock sensitivity (data in transit on the memory 
bus), and clock sensitivity is where the problem was.

Interestingly enough, one of the more common failures was when untarring 
a tarball.  Apparently that's checksummed, and every once in awhile it 
would fail that checksum.  At least that failure, unlike some of the 
other fortunately less frequent problems, was user-space-only and 
recoverable by simply running the untar once again.

What amazed me, however, was how thru all that reiserfs was stable as a 
rock.  Well, after data=ordered mode was introduced and became the 
default, anyway.  Before that, not so much. =:^\

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
                   ` (2 preceding siblings ...)
  2014-01-18  7:16 ` Duncan
@ 2014-01-20  0:17 ` George Eleftheriou
  2014-01-20  3:13   ` Austin S Hemmelgarn
  2014-01-23 16:00   ` David Sterba
  3 siblings, 2 replies; 23+ messages in thread
From: George Eleftheriou @ 2014-01-20  0:17 UTC (permalink / raw)
  To: Ian Hinder; +Cc: linux-btrfs

I have been wondering the same thing for quite some time after having
read this post (which makes a pretty clear case in favour of ECC
RAM)...

hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/

... and the ZFS on Linux FAQ
hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory

Moreover, the ZFS community seem to cite this article quite often:
hxxp://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf

Without having further knowledge on that matter, I tend to believe
(but I hope I'm wrong) that BTRFS is as vulnerable as ZFS to memory
errors. Since I upgraded recently, it's a bit too late for purchasing
ECC-capable infrastructure (change of CPU + motherboard + RAM) so I
just chose to ignore this risk by performing a memtest86 right before
every scrub (and having my regular backups ready). I've been using ZFS
on Linux for almost 5 months (having occasional issues with kernel
updates) until last week that I finally switched to BTRFS and I'm
happy.

As for the reliability of ECC RAM (from what I've read about it) it's
just that it corrects single-bit errors and it immediately halts the
system when it finds multi-bit errors.

On Sat, Jan 18, 2014 at 1:23 AM, Ian Hinder <ian.hinder@aei.mpg.de> wrote:
> Hi,
>
> I have been reading a lot of articles online about the dangers of using ZFS with non-ECC RAM.  Specifically, the fact that when good data is read from disk and compared with its checksum, a RAM error can cause the read data to be incorrect, causing a checksum failure, and the bad data might now be written back to the disk in an attempt to correct it, corrupting it in the process.  This would be exacerbated by a scrub, which could run through all your data and potentially corrupt it.  There is a strong current of opinion that using ZFS without ECC RAM is "suicide for your data".
>
> I have been unable to find any discussion of the extent to which this is true for btrfs.  Does btrfs handle checksum errors in the same way as ZFS, or does it perform additional checks before writing "corrected" data back to disk?  For example, if it detects a checksum error, it could read the data again to a different memory location to determine if the error existed in the disk copy or the memory.
>
> From what I've been reading, it sounds like ZFS should not be used with non-ECC RAM.  This is reasonable, as ZFS' resource requirements mean that you probably only want to run it on server-grade hardware anyway.  But with btrfs eventually being the default filesystem for Linux, that would mean that all linux machines, even cheap consumer-grade hardware, would need ECC RAM, or forego many of the advantages of btrfs.
>
> What is the situation?
>
> --
> Ian Hinder
> http://numrel.aei.mpg.de/people/hinder
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20  0:17 ` George Eleftheriou
@ 2014-01-20  3:13   ` Austin S Hemmelgarn
  2014-01-20 14:57     ` Ian Hinder
  2014-01-20 15:55     ` Fajar A. Nugraha
  2014-01-23 16:00   ` David Sterba
  1 sibling, 2 replies; 23+ messages in thread
From: Austin S Hemmelgarn @ 2014-01-20  3:13 UTC (permalink / raw)
  To: George Eleftheriou, Ian Hinder; +Cc: linux-btrfs

On 01/19/2014 07:17 PM, George Eleftheriou wrote:
> I have been wondering the same thing for quite some time after
> having read this post (which makes a pretty clear case in favour of
> ECC RAM)...
> 
> hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/
>
>  ... and the ZFS on Linux FAQ 
> hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory
> 
> Moreover, the ZFS community seem to cite this article quite often: 
> hxxp://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf
>
>  Without having further knowledge on that matter, I tend to
> believe (but I hope I'm wrong) that BTRFS is as vulnerable as ZFS
> to memory errors. Since I upgraded recently, it's a bit too late
> for purchasing ECC-capable infrastructure (change of CPU +
> motherboard + RAM) so I just chose to ignore this risk by
> performing a memtest86 right before every scrub (and having my
> regular backups ready). I've been using ZFS on Linux for almost 5
> months (having occasional issues with kernel updates) until last
> week that I finally switched to BTRFS and I'm happy.
AFAIK, ZFS does background data scrubbing without user intervention
(which on a separate note can make it a huge energy hog) to correct
on-disk errors.  For performance reasons though, it has no built-in
check to make sure that there really is an error, it just assumes that
if the checksum is wrong, the data on the disk must be wrong.  This is
fine for enterprise level hardware with ECC RAM, because the disk IS
more likely to be wrong in that case than the RAM is.  This assumption
falls apart though on commodity hardware (ie, no ECC RAM), hence the
warnings about using ZFS without ECC RAM.

BTRFS however works differently, it only scrubs data when you tell it
to.  If it encounters a checksum or read error on a data block, it
first tries to find another copy of that block elsewhere (usually on
another disk), if it still sees a wrong checksum there, or gets
another read error, or can't find another copy, then it returns a read
error to userspace, usually resulting in the program reading the data
crashing.  In most environments other than HA clustering, this is an
excellent compromise that still protects data integrity.

> As for the reliability of ECC RAM (from what I've read about it)
> it's just that it corrects single-bit errors and it immediately
> halts the system when it finds multi-bit errors.
> 
That is technically only true of server-grade ECC RAM, there are
higher order ECC memory systems that most people don't even know exist
(mostly because they are almost exclusively used in spacecraft,
nuclear reactors, and other high-radiation environments), and those
can correct multi-bit errors.  Statistically speaking though, the
chance with modern memory designs of getting more than one bit wrong
in the same cell is near zero unless the memory is failing, and of
course if the cell is failing, then ECC for that cell probably can't be
trusted either.  The solution to this of course is to replace any module
that has exceeded some percentage of the MTBF (I would usually say 80%,
but I have met people who replace them at 50%).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20  3:13   ` Austin S Hemmelgarn
@ 2014-01-20 14:57     ` Ian Hinder
  2014-01-20 15:36       ` Bob Marley
  2014-01-20 16:13       ` Duncan
  2014-01-20 15:55     ` Fajar A. Nugraha
  1 sibling, 2 replies; 23+ messages in thread
From: Ian Hinder @ 2014-01-20 14:57 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: George Eleftheriou, linux-btrfs

On 20 Jan 2014, at 04:13, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:

> On 01/19/2014 07:17 PM, George Eleftheriou wrote:
>> I have been wondering the same thing for quite some time after
>> having read this post (which makes a pretty clear case in favour of
>> ECC RAM)...
>> 
>> hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449/
>> 
>> ... and the ZFS on Linux FAQ 
>> hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory
>> 
>> Moreover, the ZFS community seem to cite this article quite often: 
>> hxxp://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf
>> 
>> Without having further knowledge on that matter, I tend to
>> believe (but I hope I'm wrong) that BTRFS is as vulnerable as ZFS
>> to memory errors. Since I upgraded recently, it's a bit too late
>> for purchasing ECC-capable infrastructure (change of CPU +
>> motherboard + RAM) so I just chose to ignore this risk by
>> performing a memtest86 right before every scrub (and having my
>> regular backups ready). I've been using ZFS on Linux for almost 5
>> months (having occasional issues with kernel updates) until last
>> week that I finally switched to BTRFS and I'm happy.
> AFAIK, ZFS does background data scrubbing without user intervention
> (which on a separate note can make it a huge energy hog) to correct
> on-disk errors.  For performance reasons though, it has no built-in
> check to make sure that there really is an error, it just assumes that
> if the checksum is wrong, the data on the disk must be wrong.  This is
> fine for enterprise level hardware with ECC RAM, because the disk IS
> more likely to be wrong in that case than the RAM is.  This assumption
> falls apart though on commodity hardware (ie, no ECC RAM), hence the
> warnings about using ZFS without ECC RAM.

In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449, they talk about reconstructing corrupted data from parity information:

> Ok, no problem. ZFS will check against its parity. Oops, the parity failed since we have a new corrupted bit. Remember, the checksum data was calculated after the corruption from the first memory error occurred. So now the parity data is used to "repair" the bad data. So the data is "fixed" in RAM.

i.e. that there is parity information stored with every piece of data, and ZFS will "correct" errors automatically from the parity information.  I start to suspect that there is confusion here between checksumming for data integrity and parity information.  If this is really how ZFS works, then if memory corruption interferes with this process, then I can see how a scrub could be devastating.  I don't know if ZFS really works like this.  It sounds very odd to do this without an additional checksum check.  This sounds very different to what you say below that btrfs does, which is only to check against redundantly-stored copies, which I agree sounds much safer.  

The second link above from the ZFS FAQ just says that if you place a very high value on data integrity, you should be using ECC memory anyway, which I'm sure we can all agree with.

hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory:

> 1.16 Do I have to use ECC memory for ZFS?
> Using ECC memory for ZFS is strongly recommended for enterprise environments where the strongest data integrity guarantees are required. Without ECC memory rare random bit flips caused by cosmic rays or by faulty memory can go undetected. If this were to occur ZFS (or any other filesystem) will write the damaged data to disk and be unable to automatically detect the corruption.

i.e. if the data is bad in RAM, the checksumming in ZFS isn't going to help you, but it also isn't going to make things worse.  It doesn't say that a scrub can kill all your data, like the previous link does.  

In hxxp://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf, they essentially say that memory errors can cause problems, and it would be nice if filesystems used their checksumming to also checksum memory locations etc, which would help to detect and mitigate these.

On 20 Jan 2014, at 04:13, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:

> BTRFS however works differently, it only scrubs data when you tell it
> to.  If it encounters a checksum or read error on a data block, it
> first tries to find another copy of that block elsewhere (usually on
> another disk), if it still sees a wrong checksum there, or gets
> another read error, or can't find another copy, then it returns a read
> error to userspace, usually resulting in the program reading the data
> crashing.  In most environments other than HA clustering, this is an
> excellent compromise that still protects data integrity.

Yes, this sounds fine.

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20 14:57     ` Ian Hinder
@ 2014-01-20 15:36       ` Bob Marley
  2014-01-20 16:04         ` Austin S Hemmelgarn
  2014-01-20 16:08         ` George Mitchell
  2014-01-20 16:13       ` Duncan
  1 sibling, 2 replies; 23+ messages in thread
From: Bob Marley @ 2014-01-20 15:36 UTC (permalink / raw)
  To: linux-btrfs

On 20/01/2014 15:57, Ian Hinder wrote:
> i.e. that there is parity information stored with every piece of data, 
> and ZFS will "correct" errors automatically from the parity information. 

So this is not just parity data to check correctness but there are many 
more additional bits to actually correct these errors, based on an 
algorithm like reed-solomon ?

Where can I find information on how much "parity" is stored in ZFS ?

> I start to suspect that there is confusion here between checksumming 
> for data integrity and parity information. If this is really how ZFS 
> works, then if memory corruption interferes with this process, then I 
> can see how a scrub could be devastating. 

I don't . If you have additional bits to correct errors (other than 
detect errors), this will never be worse than having less of them.
All algorithms I know of, don't behave any worse if the erroneous bits 
are in the checksum part, or if the algorithm is correct+detect instead 
of just detect.
If the algorithm stores X+2Y extra bits (supposed ZFS case) in order to 
detect&correct Y erroneous bits and detect additional X erroneous bits, 
this will not be worse than having just X checksum bits (btrfs case).

So does ZFS really uses detect&correct parity? I'd expect this to be 
quite a lot computationally expensive

> I don't know if ZFS really works like this. It sounds very odd to do 
> this without an additional checksum check. This sounds very different 
> to what you say below that btrfs does, which is only to check against 
> redundantly-stored copies, which I agree sounds much safer. The second 
> link above from the ZFS FAQ just says that if you place a very high 
> value on data integrity, you should be using ECC memory anyway, which 
> I'm sure we can all agree with. 
> hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory:
>> 1.16 Do I have to use ECC memory for ZFS?
>> Using ECC memory for ZFS is strongly recommended for enterprise environments where the strongest data integrity guarantees are required. Without ECC memory rare random bit flips caused by cosmic rays or by faulty memory can go undetected. If this were to occur ZFS (or any other filesystem) will write the damaged data to disk and be unable to automatically detect the corruption.

The above sentence imho means that the data can get corrupted just prior 
to its first write.
This is obviously applicable to every filesystem on earth, without ECC, 
especially if it happens prior to the computation of the parity.

BM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20 15:36       ` Bob Marley
@ 2014-01-20 16:04         ` Austin S Hemmelgarn
  2014-01-20 16:08         ` George Mitchell
  1 sibling, 0 replies; 23+ messages in thread
From: Austin S Hemmelgarn @ 2014-01-20 16:04 UTC (permalink / raw)
  To: Bob Marley, linux-btrfs

On 2014-01-20 10:36, Bob Marley wrote:
> On 20/01/2014 15:57, Ian Hinder wrote:
>> i.e. that there is parity information stored with every piece of data,
>> and ZFS will "correct" errors automatically from the parity information. 
> 
> So this is not just parity data to check correctness but there are many
> more additional bits to actually correct these errors, based on an
> algorithm like reed-solomon ?
> 
> Where can I find information on how much "parity" is stored in ZFS ?
> 
>> I start to suspect that there is confusion here between checksumming
>> for data integrity and parity information. If this is really how ZFS
>> works, then if memory corruption interferes with this process, then I
>> can see how a scrub could be devastating. 
> 
> I don't . If you have additional bits to correct errors (other than
> detect errors), this will never be worse than having less of them.
> All algorithms I know of, don't behave any worse if the erroneous bits
> are in the checksum part, or if the algorithm is correct+detect instead
> of just detect.
> If the algorithm stores X+2Y extra bits (supposed ZFS case) in order to
> detect&correct Y erroneous bits and detect additional X erroneous bits,
> this will not be worse than having just X checksum bits (btrfs case).
> 
> So does ZFS really uses detect&correct parity? I'd expect this to be
> quite a lot computationally expensive
> 
>> I don't know if ZFS really works like this. It sounds very odd to do
>> this without an additional checksum check. This sounds very different
>> to what you say below that btrfs does, which is only to check against
>> redundantly-stored copies, which I agree sounds much safer. The second
>> link above from the ZFS FAQ just says that if you place a very high
>> value on data integrity, you should be using ECC memory anyway, which
>> I'm sure we can all agree with.
>> hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory:
>>> 1.16 Do I have to use ECC memory for ZFS?
>>> Using ECC memory for ZFS is strongly recommended for enterprise
>>> environments where the strongest data integrity guarantees are
>>> required. Without ECC memory rare random bit flips caused by cosmic
>>> rays or by faulty memory can go undetected. If this were to occur ZFS
>>> (or any other filesystem) will write the damaged data to disk and be
>>> unable to automatically detect the corruption.
> 
> The above sentence imho means that the data can get corrupted just prior
> to its first write.
> This is obviously applicable to every filesystem on earth, without ECC,
> especially if it happens prior to the computation of the parity.
> 
> BM
I apparently misunderstood what I had read about ZFS.  As for the parity
though, it's equivalent to RAID5, RAID6, or distributed striped
triple-parity RAID.  Looking further into ZFS itself, I'm starting to
wonder why ECC would be recommended for ZFS in cases other than using it
on a single disk, it should be able to handle SEU's as long as they
don't hit the executable code itself (it uses SHA256, which makes the
chances of a single-bit error going undetected astronomical in scale).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20 15:36       ` Bob Marley
  2014-01-20 16:04         ` Austin S Hemmelgarn
@ 2014-01-20 16:08         ` George Mitchell
  2014-01-25  0:45           ` Chris Murphy
  1 sibling, 1 reply; 23+ messages in thread
From: George Mitchell @ 2014-01-20 16:08 UTC (permalink / raw)
  To: linux-btrfs

After reading the recent posts on this topic I am beginning to think 
there is some real confusion between "check sums" and "parity". These 
are two different things which serve two different purposes. In each 
case, bad RAM would have different repercussions.  But I still fail to 
see how, in the case of either btrfs or zfs, bad RAM could damage data 
via the checksum process.  In fact, I would expect that checksum would 
guard against that very thing.  If not, checksum is very badly 
implemented.  Most of the world uses non-ECC RAM.  I simply cannot 
believe that a file system like zfs would expose the user to more risk 
than ext4.  I think there is some very inappropriate panic going on over 
this thing.  Just because one source has asserted that something like 
this can happen does not make it fact.  As it concerns zfs, it needs to 
be brought up at a zfs forum, not here.  As it concerns btrfs, I think 
it has been made clear by some of the sharpest contributors here that 
this is not going to pose any risk with btrfs.  If a btrfs checksum 
fails, btrfs is not going to change anything if it cannot find a copy 
that passes checksum, and bad RAM is not going to cause bad data to pass 
checksum.  But I CAN see how bad RAM could affect parity calculations 
and resulting data IN THE ABSENCE of protective checksums and cannot 
help but wonder if THAT is what the original article is describing.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20 16:08         ` George Mitchell
@ 2014-01-25  0:45           ` Chris Murphy
  2014-01-27 16:08             ` Calvin Walton
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2014-01-25  0:45 UTC (permalink / raw)
  To: Btrfs BTRFS

On Jan 20, 2014, at 9:08 AM, George Mitchell <george@chinilu.com> wrote:

> After reading the recent posts on this topic I am beginning to think there is some real confusion between "check sums" and "parity". 

Yes, I often see conventional raid6 assumed to always be capable of detecting and correcting corruption, not merely the ability to rebuild missing data of known location and length in a stripe (known either due to the drive reporting read error along with LBA; or a physical device becoming unavailable or unresponsive).

> But I CAN see how bad RAM could affect parity calculations and resulting data IN THE ABSENCE of protective checksums and cannot help but wonder if THAT is what the original article is describing.

Does Btrfs (and I'd presume ZFS) raid5/6 checksum both data and parity chunks? And in the case of raid6 are the two parities separately checksummed?

Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-25  0:45           ` Chris Murphy
@ 2014-01-27 16:08             ` Calvin Walton
  2014-01-27 16:42               ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Calvin Walton @ 2014-01-27 16:08 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Fri, 2014-01-24 at 17:45 -0700, Chris Murphy wrote:
> On Jan 20, 2014, at 9:08 AM, George Mitchell <george@chinilu.com> wrote:
> 
> > After reading the recent posts on this topic I am beginning to think
> there is some real confusion between "check sums" and "parity". 
> 
> Yes, I often see conventional raid6 assumed to always be capable of
> detecting and correcting corruption, not merely the ability to rebuild
> missing data of known location and length in a stripe (known either
> due to the drive reporting read error along with LBA; or a physical
> device becoming unavailable or unresponsive).
> 
> > But I CAN see how bad RAM could affect parity calculations and
> resulting data IN THE ABSENCE of protective checksums and cannot help
> but wonder if THAT is what the original article is describing.
> 
> Does Btrfs (and I'd presume ZFS) raid5/6 checksum both data and parity
> chunks? And in the case of raid6 are the two parities separately
> checksummed?

The checksumming in Btrfs is actually the other way around; the file
extents and filesystem metadata structures are checksummed before the
RAID parity is applied. The resulting blocks containing both data and
checksums are split over multiple drives with parity.

Presumably once scrub is implemented in btrfs RAID5/6, it will work by
first reconstructing the filesystem structure using data chunks, then
checking the checksum; if the checksum fails, it will attempt to
reconstruct the filesystem structure using the parity chunks instead of
data chunks and then re-check the checksum to confirm the reconstruction
was successful.

-- 
Calvin Walton <calvin.walton@kepstin.ca>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-27 16:08             ` Calvin Walton
@ 2014-01-27 16:42               ` Chris Murphy
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Murphy @ 2014-01-27 16:42 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Btrfs BTRFS


On Jan 27, 2014, at 9:08 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote:

> On Fri, 2014-01-24 at 17:45 -0700, Chris Murphy wrote:
>> On Jan 20, 2014, at 9:08 AM, George Mitchell <george@chinilu.com> wrote:
>> 
>>> After reading the recent posts on this topic I am beginning to think
>> there is some real confusion between "check sums" and "parity". 
>> 
>> Yes, I often see conventional raid6 assumed to always be capable of
>> detecting and correcting corruption, not merely the ability to rebuild
>> missing data of known location and length in a stripe (known either
>> due to the drive reporting read error along with LBA; or a physical
>> device becoming unavailable or unresponsive).
>> 
>>> But I CAN see how bad RAM could affect parity calculations and
>> resulting data IN THE ABSENCE of protective checksums and cannot help
>> but wonder if THAT is what the original article is describing.
>> 
>> Does Btrfs (and I'd presume ZFS) raid5/6 checksum both data and parity
>> chunks? And in the case of raid6 are the two parities separately
>> checksummed?
> 
> The checksumming in Btrfs is actually the other way around; the file
> extents and filesystem metadata structures are checksummed before the
> RAID parity is applied. The resulting blocks containing both data and
> checksums are split over multiple drives with parity.

Got it, thanks, that makes sense. Layout wise, I've read it's better for reduced latency if metadata profile is raid1 rather than raid5/6, when data profile is raid 5/6. I'm pretty sure with multiple devices if a metadata profile isn't explicitly stated at mkfs time, that it's assumed to be raid1, which sounds like a good default. With n mirrors, it may one day make sense to have the number of metadata copies scale automatically based on the number of devices (and always could be explicitly set or changed by the user).


Chris Murphy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20 14:57     ` Ian Hinder
  2014-01-20 15:36       ` Bob Marley
@ 2014-01-20 16:13       ` Duncan
  1 sibling, 0 replies; 23+ messages in thread
From: Duncan @ 2014-01-20 16:13 UTC (permalink / raw)
  To: linux-btrfs

Ian Hinder posted on Mon, 20 Jan 2014 15:57:42 +0100 as excerpted:

> In hxxp://forums.freenas.org/threads/ecc-vs-non-ecc-ram-and-zfs.15449,
> they talk about reconstructing corrupted data from parity information:
> 
>> Ok, no problem. ZFS will check against its parity. Oops, the parity
>> failed since we have a new corrupted bit. Remember, the checksum data
>> was calculated after the corruption from the first memory error
>> occurred. So now the parity data is used to "repair" the bad data. So
>> the data is "fixed" in RAM.
> 
> i.e. that there is parity information stored with every piece of data,
> and ZFS will "correct" errors automatically from the parity information.
> I start to suspect that there is confusion here between checksumming
> for data integrity and parity information.  If this is really how ZFS
> works, then if memory corruption interferes with this process, then I
> can see how a scrub could be devastating.  I don't know if ZFS really
> works like this.  It sounds very odd to do this without an additional
> checksum check.

Good point on the difference between parity and checksumming.

I've absolutely no confirmation of this, but privately I've begun to 
wonder if this difference has anything at all to do with the delay in 
getting "complete" raid5/6 support in btrfs, including scrub; if once 
they actually started working with it, they realized that the traditional 
parity solution of raid5/6 didn't end up working out so well with 
checksumming, such that in an ungraceful shutdown/crash situation, if the 
crash happened at just the wrong point, if the parity and checksumming 
would actually fight each other, such that restoring one (presumably 
parity, since it'd be the lower level, closer to the metal) triggered a 
failure of the other (presumably the checksumming, existing above the 
parity).

That could trigger all sorts of issues that I suppose to be solvable in 
theory, but said theory is well beyond me, and could well invite complex 
coding issues that are incredibly difficult to resolve in a satisfactory 
way, thus the hiccup in getting /complete/ btrfs raid5/6 support, even 
when the basic parity calculation and write-out as been in-kernel for 
several kernel cycles already, and was available as patches from well 
before that.

If that's correct (and again, I've absolutely nothing but the delay and 
personal intuition to back it up, and the delay in itself means little, 
as it seems most btrfs features have taken longer to complete than 
originally planned, such that btrfs as a whole is now years behind the 
originally it turned out wildly optimistic plan, meaning it's simply 
personal intuition, and as I'm not a dev that should mean approximately 
nothing to anyone else, so take it for what it's worth...), then 
ultimately we may end up with btrfs raid5/6 modes that end up being 
declared usable, but that come with lower integrity and checksumming 
guarantees (particularly across device failure and replace) than those 
that normally apply to btrfs in other configurations.  At least for the 
btrfs initially considered stable.  Perhaps down the road a few years a 
more advanced btrfs raid5/6 implementation, with better integrity/
checksumming guarantees, would become available.

Perhaps zfs has a similar parity mode, as opposed to real checksumming, 
but has real checksumming in other modes?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20  3:13   ` Austin S Hemmelgarn
  2014-01-20 14:57     ` Ian Hinder
@ 2014-01-20 15:55     ` Fajar A. Nugraha
  1 sibling, 0 replies; 23+ messages in thread
From: Fajar A. Nugraha @ 2014-01-20 15:55 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Jan 20, 2014 at 10:13 AM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> AFAIK, ZFS does background data scrubbing without user intervention


No, it doesn't.

> BTRFS however works differently, it only scrubs data when you tell it
> to.  If it encounters a checksum or read error on a data block, it
> first tries to find another copy of that block elsewhere (usually on
> another disk), if it still sees a wrong checksum there, or gets
> another read error, or can't find another copy, then it returns a read
> error to userspace,


zfs does the same thing.

-- 
Fajar

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
  2014-01-20  0:17 ` George Eleftheriou
  2014-01-20  3:13   ` Austin S Hemmelgarn
@ 2014-01-23 16:00   ` David Sterba
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2014-01-23 16:00 UTC (permalink / raw)
  To: George Eleftheriou; +Cc: Ian Hinder, linux-btrfs

On Mon, Jan 20, 2014 at 01:17:08AM +0100, George Eleftheriou wrote:
> Without having further knowledge on that matter, I tend to believe
> (but I hope I'm wrong) that BTRFS is as vulnerable as ZFS to memory
> errors. Since I upgraded recently, it's a bit too late for purchasing
> ECC-capable infrastructure (change of CPU + motherboard + RAM) so I
> just chose to ignore this risk by performing a memtest86 right before
> every scrub (and having my regular backups ready).

Also, I haven't seen it mentioned in the thread, scrub can also run in
read-only mode:

  $ btrfs scrub start -r /mnt

so you can do a safe RO pass before a RW one and examine the reported
problems before they get fixed automatically.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs and ECC RAM
@ 2014-01-20 15:27 Ian Hinder
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Hinder @ 2014-01-20 15:27 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

(apologies for messing up the threading; I thought I could get away with not subscribing.  I've subscribed now.)

> Martin Steigerwald <Martin <at> lichtvoll.de> wrote:Am Samstag, 18. Januar 2014, 07:16:42 schrieb :
> I think Ian refers to the slight chance that BTRFS assumes the checksum on one 
> disk to be incorrect due to a memory error 
> *and*
>  on another disk to be correct 
> due to another memory error 
> *and*
>  will silently rewrite the incorrect data to 
> the correct data.
> 
> AFAIK BTRFS still does not correct such errors automatically, but only on a 
> scrub. There this 
> *could*
>  happen theoretically.
> 
> My gut feeling is, that this is highly, highly unlikely.
> 
> At least not more likely than a controller writing out garbage or other such 
> hardware issues.

Actually, I hadn't fully understood this scenario; I was just asking because of what some of the ZFS people were saying.  

To clarify, what you describe could happen like this (is this what you meant?):

- Checksum is computed
- Checksum and data written to locations A and B, but location B suffers a memory corruption of the data en-route (maybe in some intermediate buffer) so is stored incorrectly on disk
- btrfs scrub then reads A, but suffers a memory error, and thinks the good data is bad
- Hence B is read, but another memory error causes the checksum to pass
- Since the checksum passed, B is written to A, overwriting the data

This requires a collision in the checksumming algorithm, so I don't think we need to worry about this case. It's at least as what could happen by chance with a random disk error, but the chance is negligible.

Another possibility is that A and B are both correctly written.  A memory error then happen when reading A from disk, triggering a read of B, which passes its checksum.  B is then written to A, but another memory error in some buffer causes a corruption.  This doesn't require a checksum collision, just frequent memory errors.  But it could indeed lead to trashing the whole FS during a scrub if memory errors are sufficiently frequent perhaps?  If you have memory errors occurring sufficiently frequently that you get two errors while processing a single block during a scrub, then your memory is probably very far gone.  On the other hand, if btrfs is reusing the same two memory buffers for reads and writes, and you happen to have errors in those buffers, then maybe this isn't so unlikely?  This could maybe be mitigated by cancelling the scrub if there are too many errors requiring rewrites.  

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-01-27 16:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-18  0:23 btrfs and ECC RAM Ian Hinder
2014-01-18  0:49 ` cwillu
2014-01-18  1:10 ` George Mitchell
2014-01-18  7:16 ` Duncan
2014-01-19 19:02   ` Martin Steigerwald
2014-01-19 20:20     ` George Mitchell
2014-01-19 20:54       ` Duncan
2014-01-24 23:57       ` Russell Coker
2014-01-25  4:34         ` Duncan
2014-01-19 21:32     ` Duncan
2014-01-20  0:17 ` George Eleftheriou
2014-01-20  3:13   ` Austin S Hemmelgarn
2014-01-20 14:57     ` Ian Hinder
2014-01-20 15:36       ` Bob Marley
2014-01-20 16:04         ` Austin S Hemmelgarn
2014-01-20 16:08         ` George Mitchell
2014-01-25  0:45           ` Chris Murphy
2014-01-27 16:08             ` Calvin Walton
2014-01-27 16:42               ` Chris Murphy
2014-01-20 16:13       ` Duncan
2014-01-20 15:55     ` Fajar A. Nugraha
2014-01-23 16:00   ` David Sterba
  -- strict thread matches above, loose matches on Subject: below --
2014-01-20 15:27 Ian Hinder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox