public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Russell Coker <russell@coker.com.au>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Scrub problem with Debian kernel 6.12.33+deb13-amd64
Date: Tue, 8 Jul 2025 08:00:35 +0930	[thread overview]
Message-ID: <5defd3da-16df-4ffb-907b-7bbec0bf9a41@gmx.com> (raw)
In-Reply-To: <3036994.e9J7NaK4W3@dojacat>



在 2025/7/7 20:25, Russell Coker 写道:
> I ran a scrub on my laptop running the latest Debian/Testing setup.  It's a
> Thinkpad X1 Carbon Gen6 that has just been updated to the latest firmware
> (Thinkpad BIOS, management engine, and some 3rd thing on the motherboard).  It
> had crashed a few times before which I think has been fixed by the firmware
> update, it is plausible that the crashes caused some corruption.

You miss the most important thing, kernel version.

> 
> The system is running LUKS encryption.  After the monthly btrfs scrub I got
> the following in the cron output:
> 
> ERROR: there are 1 uncorrectable errors
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started:    Mon Jul  7 02:30:01 2025
> Status:           finished
> Duration:         0:02:46
> Total to scrub:   226.35GiB
> Rate:             1.36GiB/s
> Error summary:    csum=110693
>    Corrected:      0
>    Uncorrectable:  110693
>    Unverified:     0
> 
> I ran the following commands to get more data and got the below output.  It
> seems that we have a clear problem of btrfs dev sta reporting 0 errors when
> there were apparently many errors!

Already fixed by upstream commit ec1f3a207cdf ("btrfs: scrub: update 
device stats when an error is detected").

> 
> root@dojacat:/var/log# btrfs dev sta /
> [/dev/mapper/root].write_io_errs    0
> [/dev/mapper/root].read_io_errs     0
> [/dev/mapper/root].flush_io_errs    0
> [/dev/mapper/root].corruption_errs  0
> [/dev/mapper/root].generation_errs  0
> root@dojacat:/var/log# btrfs scrub status /
> UUID:             d90583c8-9284-48b4-9444-abd00924002a
> Scrub started:    Mon Jul  7 02:30:01 2025
> Status:           finished
> Duration:         0:02:46
> Total to scrub:   226.34GiB
> Rate:             1.36GiB/s
> Error summary:    csum=110693
>    Corrected:      0
>    Uncorrectable:  110693
>    Unverified:     0
> 
> 
> [190966.907320] BTRFS info (device dm-0): scrub: started on devid 1
> [191057.409078] scrub_stripe_report_errors: 110553 callbacks suppressed
> [191057.409081] scrub_stripe_report_errors: 110576 callbacks suppressed
> [191057.409084] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191057.409138] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469563904 on dev /dev/mapper/root physical 147760414720
> [191057.409300] _btrfs_printk: 290 callbacks suppressed
> [191057.409303] BTRFS warning (device dm-0): checksum error at logical
> 327469629440 on dev /dev/mapper/root, physical 147760480256, root 540, inode
> 1826602, offset 2087845888, length 4096, links 1 (path: home.old/tv/Foo.
> 2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
> 
> [many more about similar files]
> 
> [191057.410987] BTRFS warning (device dm-0): checksum error at logical
> 327469629440 on dev /dev/mapper/root, physical 147760480256, root 522, inode
> 174508, offset 2087845888, length 4096, links 1 (path: tv/Foo.
> 2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
> [191057.411281] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191057.411285] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469563904 on dev /dev/mapper/root physical 147760414720
> [191057.411458] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469432832 on dev /dev/mapper/root physical 147760283648
> [191057.411461] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469367296 on dev /dev/mapper/root physical 147760218112
> [191057.411907] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469498368 on dev /dev/mapper/root physical 147760349184
> [191057.413012] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191131.353819] BTRFS info (device dm-0): scrub: finished on devid 1 with
> status: 0
> 
> # md5sum Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv
> md5sum: Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv: Input/output error
> 
> The files in question had been subject to "cp -a --reflink=auto", across
> subvols.  When I deleted them from one subvol and deleted the snapshots of
> that subvol I ran another scrub and now I see the following:
> 
> # /bin/btrfs scrub start -B /
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started:    Mon Jul  7 20:33:18 2025
> Status:           finished
> Duration:         0:03:01
> Total to scrub:   220.04GiB
> Rate:             1.21GiB/s
> Error summary:    csum=110693
>    Corrected:      0
>    Uncorrectable:  110693
>    Unverified:     0
> ERROR: there are 1 uncorrectable errors
> # btrfs dev sta /
> [/dev/mapper/root].write_io_errs    0
> [/dev/mapper/root].read_io_errs     0
> [/dev/mapper/root].flush_io_errs    0
> [/dev/mapper/root].corruption_errs  689
> [/dev/mapper/root].generation_errs  0
> 
> So it looks like the failure to report error counts in btrfs dev sta may be
> related to cp --reflink=auto across subvols.  The csum=110693 doesn't match to
> the "corruption_errs  689" but at least it's not 0.
> 
> I removed another file that was listed as having uncorrectable errors and now
> I get the following:
> 
> # /bin/btrfs scrub start -B /
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started:    Mon Jul  7 20:46:05 2025
> Status:           finished
> Duration:         0:02:17
> Total to scrub:   173.88GiB
> Rate:             1.27GiB/s
> Error summary:    csum=7137
>    Corrected:      0
>    Uncorrectable:  7137
>    Unverified:     0
> ERROR: there are 1 uncorrectable errors
> 
> Below are the kernel messages.  No mentions of files or directories so the
> scrub doesn't seem to be doing it's job well here.  It should either fix
> things or tell me what rm command I can use to replace things that can't be
> fixed!


 > Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117 
callbacks suppressed

Thus the output of file path may be rate limited.

And dmesg is not the best way to tell end users where the corruption is, 
furthermore there are a lot of valid reasons that a path can not be 
resolved (already orphan, belongs to a tree block etc).


Although I believe you're right that btrfs should have a reliable way to 
indicate which files are affected, but we do not want to populate the 
dmesg without any limit either.

In the future we may have a better solution, but for now the best 
solution would be using the logical bytenr e.g. 327893450752, and pass 
it into `btrfs ins logical-resolve` to do the path resolve.

Another solution is to enable CONFIG_BTRFS_DEBUG=y, which disables the 
rate limit, but I do not believe any distro would enable that for 
regular kernels.

Thanks,
Qu

> 
> Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7116 callbacks
> suppressed
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117 callbacks
> suppressed
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> 
> I don't think that BTRFS is responsible for the data loss here, I think that
> is entirely due to the system crashing.  But BTRFS really isn't handling the
> recovery as well as I think it should and could.
> 


  reply	other threads:[~2025-07-07 22:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-07 10:55 Scrub problem with Debian kernel 6.12.33+deb13-amd64 Russell Coker
2025-07-07 22:30 ` Qu Wenruo [this message]
2025-07-11 21:03   ` Nicholas D Steeves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5defd3da-16df-4ffb-907b-7bbec0bf9a41@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=russell@coker.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox