From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>,
"Richard A. Lochner" <lochner@clone1.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS Data at Rest File Corruption
Date: Mon, 16 May 2016 07:33:50 -0400 [thread overview]
Message-ID: <41b097af-d565-6cd7-2ed8-cb66b9ae8ecc@gmail.com> (raw)
In-Reply-To: <CAJCQCtSYgfhmNYFE4ffxFy20B=trkh+P=hhdbrq71ysZSm_FEA@mail.gmail.com>
On 2016-05-16 02:07, Chris Murphy wrote:
> Current hypothesis
> "I suspected, and I still suspect that the error occurred upon a
> metadata update that corrupted the checksum for the file, probably due
> to silent memory corruption. If the checksum was silently corrupted,
> it would be simply written to both drives causing this type of error."
>
> A metadata update alone will not change the data checksums.
>
> But let's ignore that. If there's corrupt extent csum in a node that
> itself has a valid csum, this is functionally identical to e.g.
> nerfing 100 bytes of a file's extent data (both copies, identically).
> The fs doesn't know the difference. All it knows is the node csum is
> valid, therefore the data extent csum is valid, and that's why it
> assumes the data is wrong and hence you get an I/O error. And I can
> reproduce most of your results by nerfing file data.
>
> The entire dmesg for scrub looks like this:
>
>
> May 15 23:29:46 f23s.localdomain kernel: BTRFS warning (device dm-6):
> checksum error at logical 5566889984 on dev /dev/dm-6, sector 8540160,
> root 5, inode 258, offset 0, length 4096, links 1 (path:
> openSUSE-Tumbleweed-NET-x86_64-Current.iso)
> May 15 23:29:46 f23s.localdomain kernel: BTRFS error (device dm-6):
> bdev /dev/dm-6 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> May 15 23:29:46 f23s.localdomain kernel: BTRFS error (device dm-6):
> unable to fixup (regular) error at logical 5566889984 on dev /dev/dm-6
> May 15 23:29:46 f23s.localdomain kernel: BTRFS warning (device dm-6):
> checksum error at logical 5566889984 on dev /dev/mapper/VG-b1, sector
> 8579072, root 5, inode 258, offset 0, length 4096, links 1 (path:
> openSUSE-Tumbleweed-NET-x86_64-Current.iso)
> May 15 23:29:46 f23s.localdomain kernel: BTRFS error (device dm-6):
> bdev /dev/mapper/VG-b1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> May 15 23:29:46 f23s.localdomain kernel: BTRFS error (device dm-6):
> unable to fixup (regular) error at logical 5566889984 on dev
> /dev/mapper/VG-b1
>
> And the entire dmesg for running sha256sum on the file is
>
> May 15 23:33:41 f23s.localdomain kernel: __readpage_endio_check: 22
> callbacks suppressed
> May 15 23:33:41 f23s.localdomain kernel: BTRFS warning (device dm-6):
> csum failed ino 258 off 0 csum 3634944209 expected csum 1334657141
> May 15 23:33:41 f23s.localdomain kernel: BTRFS warning (device dm-6):
> csum failed ino 258 off 0 csum 3634944209 expected csum 1334657141
> May 15 23:33:41 f23s.localdomain kernel: BTRFS warning (device dm-6):
> csum failed ino 258 off 0 csum 3634944209 expected csum 1334657141
> May 15 23:33:41 f23s.localdomain kernel: BTRFS warning (device dm-6):
> csum failed ino 258 off 0 csum 3634944209 expected csum 1334657141
> May 15 23:33:41 f23s.localdomain kernel: BTRFS warning (device dm-6):
> csum failed ino 258 off 0 csum 3634944209 expected csum 1334657141
>
>
> And I do get an i/o error for sha256sum and no hash is computed.
>
> But there's two important differences:
> 1. I have two unable to fixup messages, one for each device, at the
> exact same time.
> 2. I altered both copies of extent data.
>
> It's a mystery to me how your file data has not changed, but somehow
> the extent csum was changed but also the node csum was recomputed
> correctly. That's a bit odd.
I would think this would be perfectly possible if some other file that
had a checksum in that node changed, thus forcing the node's checksum to
be updated. Theoretical sequence of events:
1. Some file which has a checksum in node A gets written to.
2. Node A is loaded into memory to update the checksum.
3. The new checksum for the changed extent in the file gets updated in
the in-memory copy of node A.
4. Node A has it's own checksum recomputed based on the new data, and
then gets saved to disk.
If something happened after 2 but before 4 that caused one of the other
checksums to go bad, then the checksum computed in 4 will have been with
the corrupted data.
next prev parent reply other threads:[~2016-05-16 11:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-11 18:36 BTRFS Data at Rest File Corruption Richard Lochner
2016-05-11 19:01 ` Roman Mamedov
2016-05-11 19:26 ` Austin S. Hemmelgarn
2016-05-12 17:49 ` Richard A. Lochner
2016-05-12 18:29 ` Austin S. Hemmelgarn
2016-05-12 21:53 ` Goffredo Baroncelli
2016-05-12 23:15 ` Richard A. Lochner
2016-05-13 1:41 ` Chris Murphy
2016-05-13 4:49 ` Richard A. Lochner
2016-05-13 17:46 ` Chris Murphy
2016-05-15 18:43 ` Richard A. Lochner
2016-05-16 6:07 ` Chris Murphy
2016-05-16 11:33 ` Austin S. Hemmelgarn [this message]
2016-05-16 21:20 ` Richard A. Lochner
2016-05-16 22:43 ` Chris Murphy
2016-05-16 23:44 ` Richard A. Lochner
2016-05-17 3:42 ` Chris Murphy
2016-05-17 11:26 ` Austin S. Hemmelgarn
2016-05-13 16:28 ` Goffredo Baroncelli
2016-05-13 16:54 ` Austin S. Hemmelgarn
2016-05-12 6:49 ` Chris Murphy
[not found] ` <CAAuLxcaQ1Uo+pff9AtD74UwUvo5yYKBuNLwKzjVMWV1kt2DcRQ@mail.gmail.com>
2016-05-12 18:26 ` Richard A. Lochner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41b097af-d565-6cd7-2ed8-cb66b9ae8ecc@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=lochner@clone1.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).