Re: BTRFS Data at Rest File Corruption

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: "Richard A. Lochner" <lochner@clone1.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS Data at Rest File Corruption
Date: Thu, 12 May 2016 14:29:17 -0400	[thread overview]
Message-ID: <ebe609bb-3ce6-b929-97ef-ad323a254dc7@gmail.com> (raw)
In-Reply-To: <1463075341.3636.56.camel@clone1.com>

On 2016-05-12 13:49, Richard A. Lochner wrote:
> Austin,
>
> I rebooted the computer and reran the scrub to no avail.  The error is
> consistent.
>
> The reason I brought this question to the mailing list is because it
> seemed like a situation that might be of interest to the developers.
>  Perhaps, there might be a way to "defend" against this type of
> corruption.
>
> I suspected, and I still suspect that the error occurred upon a
> metadata update that corrupted the checksum for the file, probably due
> to silent memory corruption.  If the checksum was silently corrupted,
> it would be simply written to both drives causing this type of error.
That does seem to be the most likely cause, and sadly, is not something 
any filesystem can protect reliably against on any commodity hardware.
>
> With that in mind, I proved (see below) that the data blocks match on
> both mirrors.  This I expected since the data blocks should not have
> been touched as the the file has not been written.
>
> This is the sequence of events as I see them that I think might be of
> interest to the developers.
>
> 1. A block containing a checksum for the file was read into memory.
> The block read would have been checksummed, so the checksum for the
> file must have been good at that moment.
It's worth noting that BTRFS doesn't verify all the checksums in a 
metadata block when it loads that metadata block, only the ones for the 
reads that triggered the metadata block being loaded will get verified.
>
> 2. The checksum block was the altered in memory (perhaps to add or
> change a value).
>
> 3. A new checksum would then have been calculated for the checksum
> block.
>
> 4. The checksum block would have been written to both mirrors.
>
> Presumably, in the case that I am experiencing, an undetected memory
> error must have occurred after 1 and before step 3 was completed.
>
> I wonder if there is a way to correct or detect that situation.
The closest we could get is to provide an option to handle this in 
scrub, preferably with a big scary warning on it as this same situation 
can be easily cause by someone modifying the disks themselves (we can't 
reasonably protect against that, but we shouldn't make it trivial for 
people to inject arbitrary data that way either).
>
> As I stated previously, the machine on which this occurred does not
> have ECC memory, however, I would not think that the majority of users
> running btrfs do either.  If it has happened to me, it likely has
> happened to others.
>
> Rick Lochner
>
> btrfs dmesg(s):
>
> [16510.334020] BTRFS warning (device sdb1): checksum error at logical
> 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
> 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
> [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd
> 0, flush 0, corrupt 5, gen 0
> [16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
> error at logical 3037444042752 on dev /dev/sdb1
>
> [17606.978439] BTRFS warning (device sdb1): checksum error at logical
> 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
> 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
> [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd
> 13, flush 0, corrupt 4, gen 0
> [17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
> error at logical 3037444042752 on dev /dev/sdc1
>
> How I compared the data blocks:
>
> #btrfs-map-logical -l 3037444042752  /dev/sdc1
> mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1
> mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1
> mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1
> mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1
>
> #dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1
> 4096+0 records in
> 4096+0 records out
> 4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s
>
> #dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2
> 4096+0 records in
> 4096+0 records out
> 4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s
>
> #dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1
> 4096+0 records in
> 4096+0 records out
> 4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s
>
> #dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2
> 4096+0 records in
> 4096+0 records out
> 4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s
>
> #diff b1 c1
> #diff b2 c2
Excellent thinking here.

Now, if you can find some external method to verify that that block is 
in fact correct, you can just write it back into the file itself at the 
correct offset, and fix the issue.

next prev parent reply	other threads:[~2016-05-12 18:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-11 18:36 BTRFS Data at Rest File Corruption Richard Lochner
2016-05-11 19:01 ` Roman Mamedov
2016-05-11 19:26 ` Austin S. Hemmelgarn
2016-05-12 17:49   ` Richard A. Lochner
2016-05-12 18:29     ` Austin S. Hemmelgarn [this message]
2016-05-12 21:53       ` Goffredo Baroncelli
2016-05-12 23:15       ` Richard A. Lochner
2016-05-13  1:41     ` Chris Murphy
2016-05-13  4:49       ` Richard A. Lochner
2016-05-13 17:46         ` Chris Murphy
2016-05-15 18:43           ` Richard A. Lochner
2016-05-16  6:07             ` Chris Murphy
2016-05-16 11:33               ` Austin S. Hemmelgarn
2016-05-16 21:20                 ` Richard A. Lochner
2016-05-16 22:43                 ` Chris Murphy
2016-05-16 23:44                   ` Richard A. Lochner
2016-05-17  3:42                     ` Chris Murphy
2016-05-17 11:26                       ` Austin S. Hemmelgarn
2016-05-13 16:28   ` Goffredo Baroncelli
2016-05-13 16:54     ` Austin S. Hemmelgarn
2016-05-12  6:49 ` Chris Murphy
     [not found] ` <CAAuLxcaQ1Uo+pff9AtD74UwUvo5yYKBuNLwKzjVMWV1kt2DcRQ@mail.gmail.com>
2016-05-12 18:26   ` Richard A. Lochner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebe609bb-3ce6-b929-97ef-ad323a254dc7@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lochner@clone1.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).