From: "Richard A. Lochner" <lochner@clone1.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS Data at Rest File Corruption
Date: Thu, 12 May 2016 23:49:17 -0500 [thread overview]
Message-ID: <1463114957.3636.140.camel@clone1.com> (raw)
In-Reply-To: <CAJCQCtSSbv5dAC-uBN9RnYKKRMtr04KmLZVzhvAh7=Xq3ej7dQ@mail.gmail.com>
Chris,
See notes inline.
On Thu, 2016-05-12 at 19:41 -0600, Chris Murphy wrote:
> On Thu, May 12, 2016 at 11:49 AM, Richard A. Lochner <lochner@clone1.
> com> wrote:
>
> >
> > I suspected, and I still suspect that the error occurred upon a
> > metadata update that corrupted the checksum for the file, probably
> > due
> > to silent memory corruption. If the checksum was silently
> > corrupted,
> > it would be simply written to both drives causing this type of
> > error.
> Metadata is checksummed independently of data. So if the data isn't
> updated, its checksum doesn't change, only metadata checksum is
> changed.
> >
> >
> > btrfs dmesg(s):
> >
> > [16510.334020] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr
> > 0, rd
> > 0, flush 0, corrupt 5, gen 0
> > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
> > error at logical 3037444042752 on dev /dev/sdb1
> >
> > [17606.978439] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr
> > 0, rd
> > 13, flush 0, corrupt 4, gen 0
> > [17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
> > error at logical 3037444042752 on dev /dev/sdc1
> This is confusing. Are these the same boot? The later time has a
> lower
> corrupt count. Can you just 'dd if=sda4.img of=/dev/null' and report
> all (new) messages in dmesg? It seems to me there should be pretty
> much all the same monotonic-time for the problem with both devices.
My apologies, they were from different boots. After the dd, I get
these:
[109479.550836] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.596626] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.601969] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.602189] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.602323] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
>
> Also what do you get for these for each device:
>
> smartctl scterc -l /dev/sdX
> cat /sys/block/sdX/device/timeout
>
# smartctl -l scterc /dev/sdb
sartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64]
(local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools
.org
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
# smartctl -l scterc /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64]
(local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools
.org
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
# cat /sys/block/sdb/device/timeout
30
# cat /sys/block/sdc/device/timeout
30
>
next prev parent reply other threads:[~2016-05-13 4:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-11 18:36 BTRFS Data at Rest File Corruption Richard Lochner
2016-05-11 19:01 ` Roman Mamedov
2016-05-11 19:26 ` Austin S. Hemmelgarn
2016-05-12 17:49 ` Richard A. Lochner
2016-05-12 18:29 ` Austin S. Hemmelgarn
2016-05-12 21:53 ` Goffredo Baroncelli
2016-05-12 23:15 ` Richard A. Lochner
2016-05-13 1:41 ` Chris Murphy
2016-05-13 4:49 ` Richard A. Lochner [this message]
2016-05-13 17:46 ` Chris Murphy
2016-05-15 18:43 ` Richard A. Lochner
2016-05-16 6:07 ` Chris Murphy
2016-05-16 11:33 ` Austin S. Hemmelgarn
2016-05-16 21:20 ` Richard A. Lochner
2016-05-16 22:43 ` Chris Murphy
2016-05-16 23:44 ` Richard A. Lochner
2016-05-17 3:42 ` Chris Murphy
2016-05-17 11:26 ` Austin S. Hemmelgarn
2016-05-13 16:28 ` Goffredo Baroncelli
2016-05-13 16:54 ` Austin S. Hemmelgarn
2016-05-12 6:49 ` Chris Murphy
[not found] ` <CAAuLxcaQ1Uo+pff9AtD74UwUvo5yYKBuNLwKzjVMWV1kt2DcRQ@mail.gmail.com>
2016-05-12 18:26 ` Richard A. Lochner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1463114957.3636.140.camel@clone1.com \
--to=lochner@clone1.com \
--cc=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).