* Scrub problem with Debian kernel 6.12.33+deb13-amd64
@ 2025-07-07 10:55 Russell Coker
2025-07-07 22:30 ` Qu Wenruo
0 siblings, 1 reply; 3+ messages in thread
From: Russell Coker @ 2025-07-07 10:55 UTC (permalink / raw)
To: Btrfs BTRFS
I ran a scrub on my laptop running the latest Debian/Testing setup. It's a
Thinkpad X1 Carbon Gen6 that has just been updated to the latest firmware
(Thinkpad BIOS, management engine, and some 3rd thing on the motherboard). It
had crashed a few times before which I think has been fixed by the firmware
update, it is plausible that the crashes caused some corruption.
The system is running LUKS encryption. After the monthly btrfs scrub I got
the following in the cron output:
ERROR: there are 1 uncorrectable errors
Starting scrub on devid 1
scrub done for d90583c8-9284-48b4-9444-abd00924002a
Scrub started: Mon Jul 7 02:30:01 2025
Status: finished
Duration: 0:02:46
Total to scrub: 226.35GiB
Rate: 1.36GiB/s
Error summary: csum=110693
Corrected: 0
Uncorrectable: 110693
Unverified: 0
I ran the following commands to get more data and got the below output. It
seems that we have a clear problem of btrfs dev sta reporting 0 errors when
there were apparently many errors!
root@dojacat:/var/log# btrfs dev sta /
[/dev/mapper/root].write_io_errs 0
[/dev/mapper/root].read_io_errs 0
[/dev/mapper/root].flush_io_errs 0
[/dev/mapper/root].corruption_errs 0
[/dev/mapper/root].generation_errs 0
root@dojacat:/var/log# btrfs scrub status /
UUID: d90583c8-9284-48b4-9444-abd00924002a
Scrub started: Mon Jul 7 02:30:01 2025
Status: finished
Duration: 0:02:46
Total to scrub: 226.34GiB
Rate: 1.36GiB/s
Error summary: csum=110693
Corrected: 0
Uncorrectable: 110693
Unverified: 0
[190966.907320] BTRFS info (device dm-0): scrub: started on devid 1
[191057.409078] scrub_stripe_report_errors: 110553 callbacks suppressed
[191057.409081] scrub_stripe_report_errors: 110576 callbacks suppressed
[191057.409084] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469629440 on dev /dev/mapper/root physical 147760480256
[191057.409138] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469563904 on dev /dev/mapper/root physical 147760414720
[191057.409300] _btrfs_printk: 290 callbacks suppressed
[191057.409303] BTRFS warning (device dm-0): checksum error at logical
327469629440 on dev /dev/mapper/root, physical 147760480256, root 540, inode
1826602, offset 2087845888, length 4096, links 1 (path: home.old/tv/Foo.
2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
[many more about similar files]
[191057.410987] BTRFS warning (device dm-0): checksum error at logical
327469629440 on dev /dev/mapper/root, physical 147760480256, root 522, inode
174508, offset 2087845888, length 4096, links 1 (path: tv/Foo.
2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
[191057.411281] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469629440 on dev /dev/mapper/root physical 147760480256
[191057.411285] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469563904 on dev /dev/mapper/root physical 147760414720
[191057.411458] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469432832 on dev /dev/mapper/root physical 147760283648
[191057.411461] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469367296 on dev /dev/mapper/root physical 147760218112
[191057.411907] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469498368 on dev /dev/mapper/root physical 147760349184
[191057.413012] BTRFS error (device dm-0): unable to fixup (regular) error at
logical 327469629440 on dev /dev/mapper/root physical 147760480256
[191131.353819] BTRFS info (device dm-0): scrub: finished on devid 1 with
status: 0
# md5sum Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv
md5sum: Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv: Input/output error
The files in question had been subject to "cp -a --reflink=auto", across
subvols. When I deleted them from one subvol and deleted the snapshots of
that subvol I ran another scrub and now I see the following:
# /bin/btrfs scrub start -B /
Starting scrub on devid 1
scrub done for d90583c8-9284-48b4-9444-abd00924002a
Scrub started: Mon Jul 7 20:33:18 2025
Status: finished
Duration: 0:03:01
Total to scrub: 220.04GiB
Rate: 1.21GiB/s
Error summary: csum=110693
Corrected: 0
Uncorrectable: 110693
Unverified: 0
ERROR: there are 1 uncorrectable errors
# btrfs dev sta /
[/dev/mapper/root].write_io_errs 0
[/dev/mapper/root].read_io_errs 0
[/dev/mapper/root].flush_io_errs 0
[/dev/mapper/root].corruption_errs 689
[/dev/mapper/root].generation_errs 0
So it looks like the failure to report error counts in btrfs dev sta may be
related to cp --reflink=auto across subvols. The csum=110693 doesn't match to
the "corruption_errs 689" but at least it's not 0.
I removed another file that was listed as having uncorrectable errors and now
I get the following:
# /bin/btrfs scrub start -B /
Starting scrub on devid 1
scrub done for d90583c8-9284-48b4-9444-abd00924002a
Scrub started: Mon Jul 7 20:46:05 2025
Status: finished
Duration: 0:02:17
Total to scrub: 173.88GiB
Rate: 1.27GiB/s
Error summary: csum=7137
Corrected: 0
Uncorrectable: 7137
Unverified: 0
ERROR: there are 1 uncorrectable errors
Below are the kernel messages. No mentions of files or directories so the
scrub doesn't seem to be doing it's job well here. It should either fix
things or tell me what rm command I can use to replace things that can't be
fixed!
Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7116 callbacks
suppressed
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117 callbacks
suppressed
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
(regular) error at logical 327893450752 on dev /dev/mapper/root physical
148184301568
I don't think that BTRFS is responsible for the data loss here, I think that
is entirely due to the system crashing. But BTRFS really isn't handling the
recovery as well as I think it should and could.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Scrub problem with Debian kernel 6.12.33+deb13-amd64
2025-07-07 10:55 Scrub problem with Debian kernel 6.12.33+deb13-amd64 Russell Coker
@ 2025-07-07 22:30 ` Qu Wenruo
2025-07-11 21:03 ` Nicholas D Steeves
0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2025-07-07 22:30 UTC (permalink / raw)
To: Russell Coker, Btrfs BTRFS
在 2025/7/7 20:25, Russell Coker 写道:
> I ran a scrub on my laptop running the latest Debian/Testing setup. It's a
> Thinkpad X1 Carbon Gen6 that has just been updated to the latest firmware
> (Thinkpad BIOS, management engine, and some 3rd thing on the motherboard). It
> had crashed a few times before which I think has been fixed by the firmware
> update, it is plausible that the crashes caused some corruption.
You miss the most important thing, kernel version.
>
> The system is running LUKS encryption. After the monthly btrfs scrub I got
> the following in the cron output:
>
> ERROR: there are 1 uncorrectable errors
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started: Mon Jul 7 02:30:01 2025
> Status: finished
> Duration: 0:02:46
> Total to scrub: 226.35GiB
> Rate: 1.36GiB/s
> Error summary: csum=110693
> Corrected: 0
> Uncorrectable: 110693
> Unverified: 0
>
> I ran the following commands to get more data and got the below output. It
> seems that we have a clear problem of btrfs dev sta reporting 0 errors when
> there were apparently many errors!
Already fixed by upstream commit ec1f3a207cdf ("btrfs: scrub: update
device stats when an error is detected").
>
> root@dojacat:/var/log# btrfs dev sta /
> [/dev/mapper/root].write_io_errs 0
> [/dev/mapper/root].read_io_errs 0
> [/dev/mapper/root].flush_io_errs 0
> [/dev/mapper/root].corruption_errs 0
> [/dev/mapper/root].generation_errs 0
> root@dojacat:/var/log# btrfs scrub status /
> UUID: d90583c8-9284-48b4-9444-abd00924002a
> Scrub started: Mon Jul 7 02:30:01 2025
> Status: finished
> Duration: 0:02:46
> Total to scrub: 226.34GiB
> Rate: 1.36GiB/s
> Error summary: csum=110693
> Corrected: 0
> Uncorrectable: 110693
> Unverified: 0
>
>
> [190966.907320] BTRFS info (device dm-0): scrub: started on devid 1
> [191057.409078] scrub_stripe_report_errors: 110553 callbacks suppressed
> [191057.409081] scrub_stripe_report_errors: 110576 callbacks suppressed
> [191057.409084] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191057.409138] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469563904 on dev /dev/mapper/root physical 147760414720
> [191057.409300] _btrfs_printk: 290 callbacks suppressed
> [191057.409303] BTRFS warning (device dm-0): checksum error at logical
> 327469629440 on dev /dev/mapper/root, physical 147760480256, root 540, inode
> 1826602, offset 2087845888, length 4096, links 1 (path: home.old/tv/Foo.
> 2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
>
> [many more about similar files]
>
> [191057.410987] BTRFS warning (device dm-0): checksum error at logical
> 327469629440 on dev /dev/mapper/root, physical 147760480256, root 522, inode
> 174508, offset 2087845888, length 4096, links 1 (path: tv/Foo.
> 2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv)
> [191057.411281] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191057.411285] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469563904 on dev /dev/mapper/root physical 147760414720
> [191057.411458] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469432832 on dev /dev/mapper/root physical 147760283648
> [191057.411461] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469367296 on dev /dev/mapper/root physical 147760218112
> [191057.411907] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469498368 on dev /dev/mapper/root physical 147760349184
> [191057.413012] BTRFS error (device dm-0): unable to fixup (regular) error at
> logical 327469629440 on dev /dev/mapper/root physical 147760480256
> [191131.353819] BTRFS info (device dm-0): scrub: finished on devid 1 with
> status: 0
>
> # md5sum Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv
> md5sum: Foo.2024.S01E08.1080p.WEB.H264-SuccessfulCrab.mkv: Input/output error
>
> The files in question had been subject to "cp -a --reflink=auto", across
> subvols. When I deleted them from one subvol and deleted the snapshots of
> that subvol I ran another scrub and now I see the following:
>
> # /bin/btrfs scrub start -B /
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started: Mon Jul 7 20:33:18 2025
> Status: finished
> Duration: 0:03:01
> Total to scrub: 220.04GiB
> Rate: 1.21GiB/s
> Error summary: csum=110693
> Corrected: 0
> Uncorrectable: 110693
> Unverified: 0
> ERROR: there are 1 uncorrectable errors
> # btrfs dev sta /
> [/dev/mapper/root].write_io_errs 0
> [/dev/mapper/root].read_io_errs 0
> [/dev/mapper/root].flush_io_errs 0
> [/dev/mapper/root].corruption_errs 689
> [/dev/mapper/root].generation_errs 0
>
> So it looks like the failure to report error counts in btrfs dev sta may be
> related to cp --reflink=auto across subvols. The csum=110693 doesn't match to
> the "corruption_errs 689" but at least it's not 0.
>
> I removed another file that was listed as having uncorrectable errors and now
> I get the following:
>
> # /bin/btrfs scrub start -B /
> Starting scrub on devid 1
> scrub done for d90583c8-9284-48b4-9444-abd00924002a
> Scrub started: Mon Jul 7 20:46:05 2025
> Status: finished
> Duration: 0:02:17
> Total to scrub: 173.88GiB
> Rate: 1.27GiB/s
> Error summary: csum=7137
> Corrected: 0
> Uncorrectable: 7137
> Unverified: 0
> ERROR: there are 1 uncorrectable errors
>
> Below are the kernel messages. No mentions of files or directories so the
> scrub doesn't seem to be doing it's job well here. It should either fix
> things or tell me what rm command I can use to replace things that can't be
> fixed!
> Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117
callbacks suppressed
Thus the output of file path may be rate limited.
And dmesg is not the best way to tell end users where the corruption is,
furthermore there are a lot of valid reasons that a path can not be
resolved (already orphan, belongs to a tree block etc).
Although I believe you're right that btrfs should have a reliable way to
indicate which files are affected, but we do not want to populate the
dmesg without any limit either.
In the future we may have a better solution, but for now the best
solution would be using the logical bytenr e.g. 327893450752, and pass
it into `btrfs ins logical-resolve` to do the path resolve.
Another solution is to enable CONFIG_BTRFS_DEBUG=y, which disables the
rate limit, but I do not believe any distro would enable that for
regular kernels.
Thanks,
Qu
>
> Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7116 callbacks
> suppressed
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117 callbacks
> suppressed
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
> Jul 07 20:47:20 dojacat kernel: BTRFS error (device dm-0): unable to fixup
> (regular) error at logical 327893450752 on dev /dev/mapper/root physical
> 148184301568
>
> I don't think that BTRFS is responsible for the data loss here, I think that
> is entirely due to the system crashing. But BTRFS really isn't handling the
> recovery as well as I think it should and could.
>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Scrub problem with Debian kernel 6.12.33+deb13-amd64
2025-07-07 22:30 ` Qu Wenruo
@ 2025-07-11 21:03 ` Nicholas D Steeves
0 siblings, 0 replies; 3+ messages in thread
From: Nicholas D Steeves @ 2025-07-11 21:03 UTC (permalink / raw)
To: Qu Wenruo, Russell Coker, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 3340 bytes --]
Qu Wenruo <quwenruo.btrfs@gmx.com> writes:
> 在 2025/7/7 20:25, Russell Coker 写道:
>> I ran a scrub on my laptop running the latest Debian/Testing setup. It's a
>> Thinkpad X1 Carbon Gen6 that has just been updated to the latest firmware
>> (Thinkpad BIOS, management engine, and some 3rd thing on the motherboard). It
>> had crashed a few times before which I think has been fixed by the firmware
>> update, it is plausible that the crashes caused some corruption.
>
> You miss the most important thing, kernel version.
Well, it was in the subject line (6.12.33, Debian revision 13), but
I agree that it's best to put it in both places.
>> The system is running LUKS encryption. After the monthly btrfs scrub I got
>> the following in the cron output:
>>
>> ERROR: there are 1 uncorrectable errors
>> Starting scrub on devid 1
>> scrub done for d90583c8-9284-48b4-9444-abd00924002a
>> Scrub started: Mon Jul 7 02:30:01 2025
>> Status: finished
>> Duration: 0:02:46
>> Total to scrub: 226.35GiB
>> Rate: 1.36GiB/s
>> Error summary: csum=110693
>> Corrected: 0
>> Uncorrectable: 110693
>> Unverified: 0
>>
>> I ran the following commands to get more data and got the below output. It
>> seems that we have a clear problem of btrfs dev sta reporting 0 errors when
>> there were apparently many errors!
Russell, was this bug reported in Debian? Imho this one is kind of a
big deal when we're at the release candidate stage. Please feel free to
add me to the X-Debbugs-Cc pseudoheader if/when you report btrfs-related
Debian bugs in the future. Also, since I seem to remember that you're
excellent at finding and reporting bugs, maybe you'd like to co-found a
btrfs-enablement team?
> Already fixed by upstream commit ec1f3a207cdf ("btrfs: scrub: update
> device stats when an error is detected").
Which is 5bd799d2 on the linux-6.12.y branch, and is first present in
v6.12.34, so Debian 13 (trixie) is no longer affected; however, trixie
was affected when Russell reported this issue.
[snip]
>> ERROR: there are 1 uncorrectable errors
>>
>> Below are the kernel messages. No mentions of files or directories so the
>> scrub doesn't seem to be doing it's job well here. It should either fix
>> things or tell me what rm command I can use to replace things that can't be
>> fixed!
>
>
> > Jul 07 20:47:20 dojacat kernel: scrub_stripe_report_errors: 7117
> callbacks suppressed
>
> Thus the output of file path may be rate limited.
>
> And dmesg is not the best way to tell end users where the corruption is,
> furthermore there are a lot of valid reasons that a path can not be
> resolved (already orphan, belongs to a tree block etc).
I agree!
> Although I believe you're right that btrfs should have a reliable way to
> indicate which files are affected, but we do not want to populate the
> dmesg without any limit either.
>
> In the future we may have a better solution, but for now the best
> solution would be using the logical bytenr e.g. 327893450752, and pass
> it into `btrfs ins logical-resolve` to do the path resolve.
Are logical bytenr numbers stored anywhere outside of the kernel log? I
hope they're stored in filesystem metadata :)
Regards,
Nicholas
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-11 21:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 10:55 Scrub problem with Debian kernel 6.12.33+deb13-amd64 Russell Coker
2025-07-07 22:30 ` Qu Wenruo
2025-07-11 21:03 ` Nicholas D Steeves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox