From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Przemek Klosowski <przemek.klosowski@gmail.com>,
linux-btrfs@vger.kernel.org
Subject: Re: Fwd: Fwd: HTML message rejected: btrfs checksum error
Date: Sat, 3 May 2025 18:10:17 +0930 [thread overview]
Message-ID: <e822ec87-6603-4549-aef4-0e057be60a0d@gmx.com> (raw)
In-Reply-To: <CAC=1GgFJ8oDYgpX3T5FFv3bvVyDwkA-ce9c1TEugE=j9NTmHvg@mail.gmail.com>
在 2025/5/3 14:04, Przemek Klosowski 写道:
> I guess the problem is you're still trying to scrub with that
> rescue=idatacsums mount option?
>
> No--it's my root fs so I mounted the individual device as
> readonly/degraded, kind-of recursively in the same live fs. I just
> wanted to see if I can recover more data, and thought it'd be safe
> because of its readonly status.
In that case, unless fully RO mounted with "rescue=idatacsums" mount
option, btrfs will still do the data checksum verification thus leading
to the error messages, and reject reads of involved blocks.
>
> Thank you for the suggestions---after deleting the problem file, all
> is well (btrfs scrub shows no errors)
>
> By the way, I was perplexed by the tooling, because it seems that I need to run
> btrfs check --readonly --force --check-data-csum -p /dev/nvme0n1p2
> on a live filesystem (it's my root fs) to get the error locations, and
> it reports additional checksum errors due to the activity on the fs
> (mozilla cache files, etc).
>
> As you suggested, I extracted the offsets and piped to btrfs ins
> logical, which gave multiple filenames. I knew which one was the real
> one. I wonder if there's a way to avoid the fake ones?
Mind to give some examples?
It can be a bug in the logical resolve implementation, which is part of
the kernel code.
> Of course I
> could have booted from flash and mounted my real fs readonly under it,
> got the filename, remounted rw and deleted the file.
> Is there a simpler, recommended workflow for this case? Maybe making a snapshot?
Making snapshots will make it worse, as now you have to delete both the
original file and the file in the snapshots.
Unfortunately we do not have a good workflow for that, and maybe we can
add something like "btrfs rescue delete-bad-files" to do that automatically.
So far our priority is still detecting the problem, the repair is always
done by the extra copies, thus no manual data corruption fix.
But there is a recent kernel fix that will force direct-IO to be
buffered IO if the file has data checksum, to avoid any badly programmed
direct IO users to cause false data checksum mismatch.
If you're sure the corruption is not caused by hardware, then upgrading
the kernel may prevent such false alerts from happening in the future.
>
> BTW, btrfs dev stat keeps showing corruption_errs on both raid1
> devices. Do you recommend 'btrfs dev stats -z' to zero them out?
Since scrub and btrfs check reports no more error, it's completely fine
to zero them out.
Thanks,
Qu
>
>
>
> On Tue, Apr 29, 2025 at 1:44 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> 在 2025/4/29 15:02, Qu Wenruo 写道:
>>>
>>>
>>> 在 2025/4/29 11:55, Przemek Klosowski 写道:
>>>> I have a RAID1 btrfs root/home on Fedora 42 that developed what
>>>> appears to be a single data checksum error. RAM tests fine, but it's a
>>>> DELL system that had memory problems early on (years ago), that were
>>>> fixed by Dell BIOS memory tests (which changed the mem controller
>>>> settings).
>>>>
>>>> The errors seem to have started right after a scrub (see btrfs
>>>> messages from journal below)
>>>>
>>>> btrfs check --readonly --force --check-data-csum -p /dev/nvme0n1p2
>>>>
>>>> shows a cascade of errors (which seem to be increasing in number)
>>>> ..
>>>> [4/7] checking fs roots (0:00:04 elapsed, 60923
>>>> items checked)
>>>> mirror 1 bytenr 299511672832 csum 0x125beb3c expected csum
>>>> 0xc8374bb569 items checked)
>>>> mirror 1 bytenr 299511676928 csum 0x4c6adf72 expected csum 0xd82f54b8
>>>> mirror 2 bytenr 299511672832 csum 0x125beb3c expected csum 0xc8374bb5
>>>> mirror 2 bytenr 299511676928 csum 0x4c6adf72 expected csum 0xd82f54b8
>>>> mirror 1 bytenr 306513821696 csum 0x8941f998 expected csum
>>>> 0xa5fe1bfd94 items checked)
>>>> mirror 1 bytenr 306513825792 csum 0x8941f998 expected csum 0x77c755d4
>>>> .. and many more
>>>>
>>>> I can recover the file with only 1 4kB block zeroed out.
>>>>
>>>> Is there a way to read the bad sector? I thought that
>>>> mount -o ro,degraded,rescue=ignoredatacsums/dev/sda5 /mnt
>>>> would read data ignoring the bad checksum? as it is, it replicates the
>>>> I/O error that is raised when reading the original file.
>>>
>>> It turns out to be a bug in the implementation, we expect to ignore bad
>>> data csum error and return the data directly, but it's not implemented
>>> if the csum tree is still valid...
>>>
>>> I'll send out a patch for that, but that will also mean with
>>> rescue=idatacsums mount option, the data will only be the first one
>>> btrfs read out.
>>
>> It is not a bug, it is already handled properly by completely ignoring
>> the data csum tree.
>>
>> I guess the problem is you're still trying to scrub with that
>> rescue=idatacsums mount option?
>>
>> That mount option is to be used with regular read, which will not do any
>> verification now.
>>
>> Please verify if regular read on those files work.
>>
>> Thanks,
>> Qu
>>
>>>
>>> It'll be fine for your case, as both mirrors have the same csum.
>>>
>>>>
>>>> Do you think that deleting the file with the bad checksum will solve
>>>> this?
>>>
>>> Yes.
>>>
>>>> or should I move to rebuilding and restoring from backups?
>>>
>>> No need, "btrfs check --check-data-csum" is the most comprehensive check
>>> we have and it only reports error of data checksum so far (better than
>>> scrub because of the comprehensive metadata checks).
>>>
>>> Although you will need to find out all involved files, scrub is doing a
>>> good job resolving the path, but the output may be ratelimited.
>>>
>>> I'd recommend to craft a small script, parsing all involved unique
>>> bytenr into `btrfs ins logical` to get a full path to the affected files.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>>
>>>> Apr 26 22:41:04 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> started on devid 1
>>>> Apr 26 22:41:04 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> started on devid 2
>>>> Apr 26 22:41:36 fedora kernel: BTRFS error (device nvme0n1p2): unable
>>>> to fixup (regular) error at logical 452965761024 on dev /dev/nvme0n1p2
>>>> physical 74303995904
>>>> Apr 26 22:41:36 fedora kernel: BTRFS warning (device nvme0n1p2):
>>>> checksum error at logical 452965761024 on dev /dev/nvme0n1p2, physical
>>>> 74303995904, root 257, inode 35328, offset 26034176, length 4096,
>>>> links 1 (path: usr/lib/sysimage/rpm/rpmdb.sqlite-wal)
>>>> Apr 26 22:42:52 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> finished on devid 1 with status: 0
>>>> Apr 26 22:45:46 fedora kernel: BTRFS error (device nvme0n1p2): unable
>>>> to fixup (regular) error at logical 452965761024 on dev /dev/sda5
>>>> physical 147297468416
>>>> Apr 26 22:45:46 fedora kernel: BTRFS warning (device nvme0n1p2):
>>>> checksum error at logical 452965761024 on dev /dev/sda5, physical
>>>> 147297468416, root 257, inode 35328, offset 26034176, length 4096,
>>>> links 1 (path: usr/lib/sysimage/rpm/rpmdb.sqlite-wal)
>>>> Apr 26 22:48:45 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> finished on devid 2 with status: 0
>>>> Apr 26 22:53:23 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> started on devid 2
>>>> Apr 26 22:53:23 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> started on devid 1
>>>> Apr 26 22:53:52 fedora kernel: BTRFS error (device nvme0n1p2): unable
>>>> to fixup (regular) error at logical 452965761024 on dev /dev/nvme0n1p2
>>>> physical 74303995904
>>>> Apr 26 22:53:52 fedora kernel: BTRFS warning (device nvme0n1p2):
>>>> checksum error at logical 452965761024 on dev /dev/nvme0n1p2, physical
>>>> 74303995904, root 257, inode 35328, offset 26034176, length 4096,
>>>> links 1 (path: usr/lib/sysimage/rpm/rpmdb.sqlite-wal)
>>>> Apr 26 22:55:07 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> finished on devid 1 with status: 0
>>>> Apr 26 22:58:04 fedora kernel: BTRFS error (device nvme0n1p2): unable
>>>> to fixup (regular) error at logical 452965761024 on dev /dev/sda5
>>>> physical 147297468416
>>>> Apr 26 22:58:04 fedora kernel: BTRFS warning (device nvme0n1p2):
>>>> checksum error at logical 452965761024 on dev /dev/sda5, physical
>>>> 147297468416, root 257, inode 35328, offset 26034176, length 4096,
>>>> links 1 (path: usr/lib/sysimage/rpm/rpmdb.sqlite-wal)
>>>> Apr 26 23:01:01 fedora kernel: BTRFS info (device nvme0n1p2): scrub:
>>>> finished on devid 2 with status: 0
>>>> Apr 27 07:35:32 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:32 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
>>>> Apr 27 07:35:32 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:32 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
>>>> Apr 27 07:35:32 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:32 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
>>>> Apr 27 07:35:32 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:32 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
>>>> Apr 27 07:35:33 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:33 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
>>>> Apr 27 07:35:55 fedora kernel: btrfs_print_data_csum_error: 2
>>>> callbacks suppressed
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:55 fedora kernel: btrfs_dev_stat_inc_and_print: 2
>>>> callbacks suppressed
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x127b77ee expected csum
>>>> 0xcf4a5572 mirror 2
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
>>>> Apr 27 07:35:55 fedora kernel: BTRFS warning (device nvme0n1p2): csum
>>>> failed root 257 ino 35328 off 26079232 csum 0x862b6025 expected csum
>>>> 0xcf4a5572 mirror 1
>>>> Apr 27 07:35:55 fedora kernel: BTRFS error (device nvme0n1p2): bdev
>>>> /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0
>>>>
>>>
>>>
>>
>
prev parent reply other threads:[~2025-05-03 8:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1745893230-14268-mlmmj-729fb3af@vger.kernel.org>
2025-04-29 2:25 ` Fwd: HTML message rejected: btrfs checksum error Przemek Klosowski
2025-04-29 5:32 ` Qu Wenruo
2025-04-29 5:44 ` Qu Wenruo
[not found] ` <CAC=1GgEaY80tHuA1av-u8y43o_U-yF6-7b8kaDNLi=i5X-fGqw@mail.gmail.com>
2025-05-03 4:34 ` Fwd: " Przemek Klosowski
2025-05-03 8:40 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e822ec87-6603-4549-aef4-0e057be60a0d@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=przemek.klosowski@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox