Re: Btrfs/RAID5 became unmountable after SATA cable fault

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Btrfs/RAID5 became unmountable after SATA cable fault
Date: Thu, 5 Nov 2015 04:06:52 +0000 (UTC)	[thread overview]
Message-ID: <pan$f33d7$d1be1dd2$81273c9e$505ff3bb@cox.net> (raw)
In-Reply-To: 563A5251.70300@gmail.com

Austin S Hemmelgarn posted on Wed, 04 Nov 2015 13:45:37 -0500 as
excerpted:

> On 2015-11-04 13:01, Janos Toth F. wrote:
>> But the worst part is that there are some ISO files which were
>> seemingly copied without errors but their external checksums (the one
>> which I can calculate with md5sum and compare to the one supplied by
>> the publisher of the ISO file) don't match!
>> Well... this, I cannot understand.
>> How could these files become corrupt from a single disk failure? And
>> more importantly: how could these files be copied without errors? Why
>> didn't Btrfs gave a read error when the checksums didn't add up?
> If you can prove that there was a checksum mismatch and BTRFS returned
> invalid data instead of a read error or going to the other disk, then
> that is a very serious bug that needs to be fixed.  You need to keep in
> mind also however that it's completely possible that the data was bad
> before you wrote it to the filesystem, and if that's the case, there's
> nothing any filesystem can do to fix it for you.

As Austin suggests, if btrfs is returning data, and you haven't turned 
off checksumming with nodatasum or nocow, then it's almost certainly 
returning the data it was given to write out in the first place.  Whether 
that data it was given to write out was correct, however, is an 
/entirely/ different matter.

If ISOs are failing their external checksums, then something is going 
on.  Had you verified the external checksums when you first got the 
files?  That is, are you sure the files were correct as downloaded and/or 
ripped?

Where were the ISOs stored between original procurement/validation and 
writing to btrfs?  Is it possible you still have some/all of them on that 
media?  Do they still external-checksum-verify there?

Basically, assuming btrfs checksums are validating, there's three other 
likely possibilities for where the corruption could have come from before 
writing to btrfs.  Either the files were bad as downloaded or otherwise 
procured -- which is why I asked whether you verified them upon receipt 
-- or you have memory that's going bad, or your temporary storage is 
going bad, before the files ever got written to btrfs.

The memory going bad is a particularly worrying possibility, 
considering...

>> Now I am really considering to move from Linux to Windows and from
>> Btrfs RAID-5 to Storage Spaces RAID-1 + ReFS (the only limitation is
>> that ReFS is only "self-healing" on RAID-1, not RAID-5, so I need a new
>> motherboard with more native SATA connectors and an extra HDD). That
>> one seemed to actually do what it promises (abort any read operations
>> upon checksum errors [which always happens seamlessly on every read]
>> but look at the redundant data first and seamlessly "self-heal" if
>> possible). The only thing which made Btrfs to look as a better
>> alternative was the RAID-5 support. But I recently experienced two
>> cases of 1 drive failing of 3 and it always tuned out as a smaller or
>> bigger disaster (completely lost data or inconsistent data).

> Have you considered looking into ZFS?  I hate to suggest it as an
> alternative to BTRFS, but it's a much more mature and well tested
> technology than ReFS, and has many of the same features as BTRFS (and
> even has the option for triple parity instead of the double you get with
> RAID6).  If you do consider ZFS, make a point to look at FreeBSD in
> addition to the Linux version, the BSD one was a much better written
> port of the original Solaris drivers, and has better performance in many
> cases (and as much as I hate to admit it, BSD is way more reliable than
> Linux in most use cases).
> 
> You should also seriously consider whether the convenience of having a
> filesystem that fixes internal errors itself with no user intervention
> is worth the risk of it corrupting your data.  Returning correct data
> whenever possible is one thing, being 'self-healing' is completely
> different.  When you start talking about things that automatically fix
> internal errors without user intervention is when most seasoned system
> administrators start to get really nervous.  Self correcting systems
> have just as much chance to make things worse as they do to make things
> better, and most of them depend on the underlying hardware working
> correctly to actually provide any guarantee of reliability.

I too would point you at ZFS, but there's one VERY BIG caveat, and one 
related smaller one!

The people who have a lot of ZFS experience say it's generally quite 
reliable, but gobs of **RELIABLE** memory are *absolutely* *critical*!  
The self-healing works well, *PROVIDED* memory isn't producing errors.  
Absolutely reliable memory is in fact *so* critical, that running ZFS on 
non-ECC memory is severely discouraged as a very real risk to your data.

Which is why the above hints that your memory may be bad are so 
worrying.  Don't even *THINK* about ZFS, particularly its self-healing 
features, if you're not absolutely sure your memory is 100% reliable, 
because apparently, based on the comment's I've seen, if it's not, you 
WILL have data loss, likely far worse than btrfs under similar 
circumstances, because when btrfs detects a checksum error it tries 
another copy if it has one (raid1/10 mode), and simply fails the read if 
it doesn't, while apparently, zfs with self-healing activated will give 
you what it thinks is the corrected data, writing it back to repair the 
problem as well, but if memory is bad, it'll be self-damaging instead of 
self-healing, and from what I've read, that's actually a reasonably 
common experience with non-ecc RAM, the reason they so severely 
discourage attempts to run zfs on non-ecc.  But people still keep doing 
it, and still keep getting burned as a result.

(The smaller, in context, caveat, is that zfs works best with /lots/ of 
RAM, particularly when run on Linux, since it is designed to work with a 
different cache system than Linux uses, and won't work without it, so in 
effect with ZFS on Linux everything must be cached twice, upping the 
memory requirements dramatically.) 

(Tho I should mention, while not on zfs, I've actually had my own 
problems with ECC RAM too.  In my case, the RAM was certified to run at 
speeds faster than it was actually reliable at, such that actually stored 
data, what the ECC protects, was fine, the data was actually getting 
damaged in transit to/from the RAM.  On a lightly loaded system, such as 
one running many memory tests or under normal desktop usage conditions, 
the RAM was generally fine, no problems.  But on a heavily loaded system, 
such as when doing parallel builds (I run gentoo, which builds from 
sources in ordered to get the higher level of option flexibility that 
comes only when you can toggle build-time options), I'd often have memory 
faults and my builds would fail.

The most common failure, BTW, was on tarball decompression, bunzip2 or 
the like, since the tarballs contained checksums that were verified on 
data decompression, and often they'd fail to verify.

Once I updated the BIOS to one that would let me set the memory speed 
instead of using the speed the modules themselves reported, and I 
declocked the memory just one notch (this was DDR1, IIRC I declocked from 
the PC3200 it was rated, to PC3000 speeds), not only was the memory then 
100% reliable, but I could and did actually reduce the number of wait-
states for various operations, and it was STILL 100% reliable.  It simply 
couldn't handle the raw speeds it was certified to run, is all, tho it 
did handle it well enough, enough of the time, to make the problem far 
more difficult to diagnose and confirm than it would have been had the 
problem appeared at low load as well.

As it happens, I was running reiserfs at the time, and it handled both 
that hardware issue, and a number of others I've had, far better than I'd 
have expected of /any/ filesystem, when the memory feeding it is simply 
not reliable.  Reiserfs metadata, in particular, seems incredibly 
resilient in the face of hardware issues, and I lost far less data than I 
might have expected, tho without checksums and with bad memory, I imagine 
I had occasional undetected bitflip corruption in files here or there, 
but generally nothing I detected.  I still use reiserfs on my spinning 
rust today, but it's not well suited to SSD, which is where I run btrfs.

But the point for this discussion is that just because it's ECC RAM 
doesn't mean you can't have memory related errors, just that if you do, 
they're likely to be different errors, "transit errors", that will tend 
to be undetected by many memory checkers, at least the ones that don't 
tend to run full out memory bandwidth if they're simply checking that 
what was stored in a cell can be read back, unchanged.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2015-11-05  4:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <g7loe3red3ksp64hmb0vsbs2.1445476794489@email.android.com>
2015-11-04 18:01 ` Btrfs/RAID5 became unmountable after SATA cable fault Janos Toth F.
2015-11-04 18:45   ` Austin S Hemmelgarn
2015-11-05  4:06     ` Duncan [this message]
2015-11-05 12:30       ` Austin S Hemmelgarn
2015-11-06  3:19       ` Zoiled
2015-11-06  9:03   ` Janos Toth F.
2015-11-06 10:23     ` Patrik Lundquist
2016-07-23 13:20 Janos Toth F.
  -- strict thread matches above, loose matches on Subject: below --
2015-10-22  1:18 János Tóth F.
2015-10-19  8:39 Janos Toth F.
2015-10-20 14:59 ` Duncan
2015-10-21 16:09 ` Janos Toth F.
2015-10-21 16:44   ` ronnie sahlberg
2015-10-21 17:42   ` ronnie sahlberg
2015-10-21 18:40     ` Janos Toth F.
2015-10-21 17:46   ` Janos Toth F.
2015-10-21 20:26   ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$f33d7$d1be1dd2$81273c9e$505ff3bb@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).