Re: Expected behavior of bad sectors on one drive in a RAID1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: Expected behavior of bad sectors on one drive in a RAID1
Date: Tue, 20 Oct 2015 15:59:24 -0400	[thread overview]
Message-ID: <56269D1C.5080006@gmail.com> (raw)
In-Reply-To: <pan$6a87$d56eafce$4f04d0c$30b17c06@cox.net>

[-- Attachment #1: Type: text/plain, Size: 4604 bytes --]

On 2015-10-20 15:20, Duncan wrote:
> Austin S Hemmelgarn posted on Tue, 20 Oct 2015 09:59:17 -0400 as
> excerpted:
>
>
>>>> It is worth clarifying also that:
>>>> a. While BTRFS will not return bad data in this case, it also won't
>>>> automatically repair the corruption.
>>>
>>> Really?  If so I think that's a bug in BTRFS.  When mounted rw I think
>>> that every time corruption is discovered it should be automatically
>>> fixed.
>> That's debatable.  While it is safer to try and do this with BTRFS than
>> say with MD-RAID, it's still not something many seasoned system
>> administrators would want happening behind their back.  It's worth
>> noting that ZFS does not automatically fix errors, it just reports them
>> and works around them, and many distributed storage options (like Ceph
>> for example) behave like this also.  All that the checksum mismatch
>> really tells you is that at some point, the data got corrupted, it could
>> be that the copy on the disk is bad, but it could also be caused by bad
>> RAM, a bad storage controller, a loose cable, or even a bad power
>> supply.
>
> There's a significant difference between btrfs in dup/raid1/raid10 modes
> anyway and some of the others you mentioned, however.  Btrfs in these
> modes actually has a second copy of the data itself available.  That's a
> world of difference compared to parity, for instance.  With parity you're
> reconstructing the data and thus have dangers such as the write hole, and
> the possibility of bad-ram corrupting the data before it was ever saved
> (this last one being the reason zfs has such strong recommendations/
> warnings regarding the use of non-ecc RAM, based on what a number of
> posters with zfs experience have said, here).  With btrfs, there's an
> actual second copy, with both copies covered by checksum.  If one of the
> copies verifies against its checksum and the other doesn't, the odds of
> the one that verifies being any worse than the one that doesn't are...
> pretty slim, to say the least.  (So slim I'd intuitively compare them to
> the odds of getting hit by lightning, tho I've no idea what the
> mathematically rigorous comparison might be.)
ZFS doesn't just do parity, it also does RAID1 and RAID10 (and RAID0, 
although I doubt that most people actually use that with ZFS), and Ceph 
uses n-way replication by default, not erasure coding (which is 
technically a super-set of the parity algorithms used for RAID[56]).  In 
both cases, they behave just like BTRFS, they log the error and fetch a 
good copy to return to userspace, but do not modify the copy with the 
error unless explicitly told to do so.
>
> Yes, there's some small but not infinitesimal chance the checksum may be
> wrong, but if there's two copies of the data and the checksum on one is
> wrong while the checksum on the other verifies... yes, there's still that
> small chance that the one that verifies is wrong too, but that it's any
> worse than the one that does not verify?  /That's/ getting close to
> infinitesimal, or at least close enough for the purposes of a mailing-
> list claim without links to supporting evidence by someone who has
> already characterized it as not mathematically rigorous... and for me,
> personally.  I'm not spending any serious time thinking about getting hit
> by lightening, either, tho by the same token I don't go out flying kites
> or waving long metal rods around in lightning storms, either.
With a 32-bit checksum and a 4k block (the math is easier with smaller 
numbers), that's 4128 bits, which means that a random single bit error 
will have a approximately 0.24% chance of occurring in a given bit, 
which translates to an approximately 7.75% chance that it will occur in 
one of the checksum bits.  For a 16k block it's smaller of course 
(around 1.8% I think, but that's just a guess), but it's still 
sufficiently statistically likely that it should be considered.
>
> Meanwhile, it's worth noting that btrfs itself isn't yet entirely stable
> or mature, and that the chances of just plain old bugs killing the
> filesystem are far *FAR* higher than of a verified-checksum copy being
> any worse than a failed-checksum copy.  If you're worried about that at
> this point, why are you even on the btrfs list in the first place?
Actually, the improved data safety relative to ext4 is just a bonus for 
me, my biggest reason for using BTRFS is the ease of reprovisioning 
(there are few other ways to move entire systems to new storage devices 
online with zero downtime).



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

next prev parent reply	other threads:[~2015-10-20 19:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20  4:16 Expected behavior of bad sectors on one drive in a RAID1 james harvey
2015-10-20  4:45 ` Russell Coker
2015-10-20 13:00   ` Austin S Hemmelgarn
2015-10-20 13:15     ` Russell Coker
2015-10-20 13:59       ` Austin S Hemmelgarn
2015-10-20 19:20         ` Duncan
2015-10-20 19:59           ` Austin S Hemmelgarn [this message]
2015-10-20 20:54             ` Tim Walberg
2015-10-21 11:51             ` Austin S Hemmelgarn
2015-10-21 12:07               ` Austin S Hemmelgarn
2015-10-21 16:01                 ` Chris Murphy
2015-10-21 17:28                   ` Austin S Hemmelgarn
2015-10-20 18:54 ` Duncan
2015-10-20 19:48   ` Austin S Hemmelgarn
2015-10-20 21:24     ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56269D1C.5080006@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).