* 6.17rc5: btrfs scrub, Freezing user space processes failed
@ 2025-09-13 21:57 Chris Murphy
2025-09-15 23:16 ` Chris Murphy
2025-10-12 8:52 ` Askar Safin
0 siblings, 2 replies; 8+ messages in thread
From: Chris Murphy @ 2025-09-13 21:57 UTC (permalink / raw)
To: Btrfs BTRFS
kernel 6.17.0-0.rc5.42.fc43.x86_64
btrfs-progs 6.16-1.fc42.x86_64
Scrub initiated, walked away, and when I come back it appears hung with a black screen unresponsive. I gave it maybe 30 seconds, gave up and forced power off. The journal preserved what was going on.
full dmesg is attached here:
https://bugzilla.redhat.com/show_bug.cgi?id=2394998
dmesg excerpt
[ 8088.052124] kernel: BTRFS info (device dm-1): scrub: started on devid 1
[ 9662.647055] kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 9662.689046] kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 9662.793052] kernel: wlp0s20f3: deauthenticating from a4:22:49:b2:cb:a6 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 9727.984200] kernel: PM: suspend entry (deep)
[ 9727.991082] kernel: Filesystems sync: 0.007 seconds
[ 9748.172951] kernel: Freezing user space processes
[ 9748.173350] kernel: Freezing user space processes failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 9748.173520] kernel: task:btrfs state:D stack:0 pid:15156 tgid:15155 ppid:4043 task_flags:0x440140 flags:0x00004006
[ 9748.173653] kernel: Call Trace:
[ 9748.173768] kernel: <TASK>
[ 9748.173884] kernel: __schedule+0x2f9/0x7b0
[ 9748.174026] kernel: schedule+0x27/0x80
[ 9748.174166] kernel: io_schedule+0x46/0x70
[ 9748.174295] kernel: blk_mq_get_tag+0x11d/0x2d0
[ 9748.174444] kernel: ? __pfx_autoremove_wake_function+0x10/0x10
[ 9748.174545] kernel: __blk_mq_alloc_requests+0xb0/0x2b0
[ 9748.174651] kernel: blk_mq_submit_bio+0x2c3/0x890
[ 9748.174764] kernel: __submit_bio+0x74/0x280
[ 9748.174855] kernel: __submit_bio_noacct+0x90/0x210
[ 9748.174925] kernel: btrfs_submit_chunk+0x1a2/0x6c0
[ 9748.175027] kernel: ? __pfx_scrub_read_endio+0x10/0x10
[ 9748.175118] kernel: btrfs_submit_bbio+0x1a/0x30
[ 9748.175184] kernel: submit_initial_group_read+0x8a/0x1d0
[ 9748.175264] kernel: scrub_simple_mirror+0x26f/0x310
[ 9748.175372] kernel: scrub_stripe+0x512/0x7a0
[ 9748.175445] kernel: scrub_chunk+0xd0/0x170
[ 9748.175508] kernel: scrub_enumerate_chunks+0x319/0x710
[ 9748.175571] kernel: btrfs_scrub_dev+0x225/0x660
[ 9748.175641] kernel: btrfs_ioctl+0xe77/0x15d0
[ 9748.175710] kernel: __x64_sys_ioctl+0x94/0xe0
[ 9748.175779] kernel: do_syscall_64+0x82/0x2c0
[ 9748.175848] kernel: ? __lruvec_stat_mod_folio+0x85/0xd0
[ 9748.175919] kernel: ? xas_load+0x11/0x100
[ 9748.176032] kernel: ? xas_find+0x83/0x1b0
[ 9748.176116] kernel: ? next_uptodate_folio+0xa0/0x350
[ 9748.176186] kernel: ? filemap_map_pages+0x35c/0x5a0
[ 9748.176255] kernel: ? memcg1_check_events+0x60/0x1d0
[ 9748.176325] kernel: ? do_read_fault+0x107/0x260
[ 9748.176393] kernel: ? handle_pte_fault+0x118/0x240
[ 9748.176461] kernel: ? do_fault+0x150/0x260
[ 9748.176523] kernel: ? __handle_mm_fault+0x551/0x6a0
[ 9748.176591] kernel: ? count_memcg_events+0xd6/0x220
[ 9748.176670] kernel: ? handle_mm_fault+0x248/0x360
[ 9748.176740] kernel: ? do_user_addr_fault+0x21a/0x690
[ 9748.176803] kernel: ? exc_page_fault+0x74/0x180
[ 9748.176873] kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9748.176943] kernel: RIP: 0033:0x7f4a739060ed
[ 9748.176996] kernel: RSP: 002b:00007f4a737aec50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 9748.177102] kernel: RAX: ffffffffffffffda RBX: 000055e4140b79e0 RCX: 00007f4a739060ed
[ 9748.177181] kernel: RDX: 000055e4140b79e0 RSI: 00000000c400941b RDI: 0000000000000003
[ 9748.177251] kernel: RBP: 00007f4a737aeca0 R08: 0000000000000020 R09: 31203a6b63617473
[ 9748.177330] kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4a737af6c0
[ 9748.177399] kernel: R13: 00007ffe1aba7a10 R14: 00007f4a737afcdc R15: 00007ffe1aba7b17
[ 9748.177461] kernel: </TASK>
[ 9748.177531] kernel: OOM killer enabled.
[ 9748.177593] kernel: Restarting tasks: Starting
[ 9748.177678] kernel: Restarting tasks: Done
[ 9748.177746] kernel: random: crng reseeded on system resumption
[ 9748.318065] kernel: PM: suspend exit
[ 9748.318375] kernel: PM: suspend entry (s2idle)
[ 9748.341048] kernel: Filesystems sync: 0.021 seconds
[ 9768.348446] kernel: Freezing user space processes
--
Chris Murphy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-09-13 21:57 6.17rc5: btrfs scrub, Freezing user space processes failed Chris Murphy
@ 2025-09-15 23:16 ` Chris Murphy
2025-09-15 23:37 ` Qu Wenruo
2025-10-12 8:52 ` Askar Safin
1 sibling, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2025-09-15 23:16 UTC (permalink / raw)
To: Btrfs BTRFS
The storage stack may be relevant: USB flash drive -> dm-crypt -> Btrfs
Darrick Wong notes that in https://elixir.bootlin.com/linux/v6.15/source/fs/btrfs/ioctl.c#L3159
btrfs_ioctl_scrub calls mnt_want_write_file for the duration of the scrub, and mnt_want_write_file takes SB_FREEZE_WRITE and holds that all the way to the end, which means you can't fsfreeze the filesystem
So how did this ever work? Folks do use btrfsmaintenance with scrub and trim timers, and a laptop can sleep at any time. We can't inhibit this indefinitely.
Perhaps scrub and balance can be paused if pm suspend/hibernate is requested? Just make it a non-factor.
Chris Murphy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-09-15 23:16 ` Chris Murphy
@ 2025-09-15 23:37 ` Qu Wenruo
2025-09-15 23:51 ` Chris Murphy
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2025-09-15 23:37 UTC (permalink / raw)
To: Chris Murphy, Btrfs BTRFS
在 2025/9/16 08:46, Chris Murphy 写道:
> The storage stack may be relevant: USB flash drive -> dm-crypt -> Btrfs
And for your original report of no response, the problem is since btrfs
is blocking the suspension, there should be no extra reason why the
system hangs.
Unless there are other corner cases like the USB device is powered off.
If you can reproduce the bug, please catch the dying message using
something like netconsole.
>
> Darrick Wong notes that in https://elixir.bootlin.com/linux/v6.15/source/fs/btrfs/ioctl.c#L3159
>
> btrfs_ioctl_scrub calls mnt_want_write_file for the duration of the scrub, and mnt_want_write_file takes SB_FREEZE_WRITE and holds that all the way to the end, which means you can't fsfreeze the filesystem
Yes, that's already a known problem and both David and I were working on
this in the past:
https://lore.kernel.org/linux-btrfs/9606fae20bff6c1fbe14dc7b067f3b333c2a955b.1751847905.git.wqu@suse.com/
https://lore.kernel.org/linux-btrfs/20250708132540.28285-1-dsterba@suse.com/
My solution is to cancel scrub which is the simplest solution.
David's solution is pause scrub/balance using extra callbacks and a more
complex mechanism.
Thanks,
Qu
>
> So how did this ever work? Folks do use btrfsmaintenance with scrub and trim timers, and a laptop can sleep at any time. We can't inhibit this indefinitely.
>
> Perhaps scrub and balance can be paused if pm suspend/hibernate is requested? Just make it a non-factor.
>
> Chris Murphy
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-09-15 23:37 ` Qu Wenruo
@ 2025-09-15 23:51 ` Chris Murphy
0 siblings, 0 replies; 8+ messages in thread
From: Chris Murphy @ 2025-09-15 23:51 UTC (permalink / raw)
To: Qu WenRuo, Btrfs BTRFS
On Mon, Sep 15, 2025, at 7:37 PM, Qu Wenruo wrote:
> 在 2025/9/16 08:46, Chris Murphy 写道:
>> The storage stack may be relevant: USB flash drive -> dm-crypt -> Btrfs
>
> And for your original report of no response, the problem is since btrfs
> is blocking the suspension, there should be no extra reason why the
> system hangs.
>
> Unless there are other corner cases like the USB device is powered off.
I am able to successfully sleep the laptop with the same USB stick inserted and file system mounted, if a scrub is not occurring.
> If you can reproduce the bug, please catch the dying message using
> something like netconsole.
Difficult. No wired ethernet for the two computers I have available. And previously, trying to get netconsole to work with WiFi defeated me.
> Yes, that's already a known problem and both David and I were working on
> this in the past:
>
> https://lore.kernel.org/linux-btrfs/9606fae20bff6c1fbe14dc7b067f3b333c2a955b.1751847905.git.wqu@suse.com/
>
> https://lore.kernel.org/linux-btrfs/20250708132540.28285-1-dsterba@suse.com/
>
>
> My solution is to cancel scrub which is the simplest solution.
> David's solution is pause scrub/balance using extra callbacks and a more
> complex mechanism.
They need to become aware the scrub/balance was interrupted or cancelled. In any case, they have to manually intervene. So I'm not sure there's a big advantage for pause over cancel.
Also, for read-write scrub, is it appropriate for the /var/lib/btrfs tracking file information to be located in e.g. dev tree on the device being scrubbed or balanced?
--
Chris Murphy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-09-13 21:57 6.17rc5: btrfs scrub, Freezing user space processes failed Chris Murphy
2025-09-15 23:16 ` Chris Murphy
@ 2025-10-12 8:52 ` Askar Safin
2025-10-12 22:10 ` Chris Murphy
1 sibling, 1 reply; 8+ messages in thread
From: Askar Safin @ 2025-10-12 8:52 UTC (permalink / raw)
To: lists; +Cc: linux-btrfs
"Chris Murphy" <lists@colorremedies.com>:
> Scrub initiated, walked away, and when I come back it appears hung with a black screen unresponsive
I suspect here is interplay between two issues.
First is btrfs kernel bug Qu Wenruo is talking about.
Second is systemd issue, which amplifies this kernel bug.
Systemd bug turns simple "suspend doesn't work, but system continues to
operate normally" to "reboot is needed".
I wrote about this here: https://github.com/systemd/systemd/issues/38337 .
The bug is fixed in mainline and stable versions of systemd.
So you should just upgrade your systemd. The fix is backported to stable
systemd versions, so it should come to all stable Linux distros on its own.
Suspend still will not work if
scrub is running, but at least your system will be operational after
failed suspend attempt.
If this still doesn't help, then, please, tell me your full systemd version
and distro version.
--
Askar Safin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-10-12 8:52 ` Askar Safin
@ 2025-10-12 22:10 ` Chris Murphy
2025-10-16 22:15 ` Chris Murphy
0 siblings, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2025-10-12 22:10 UTC (permalink / raw)
To: Askar Safin; +Cc: Btrfs BTRFS
On Sun, Oct 12, 2025, at 4:52 AM, Askar Safin wrote:
> "Chris Murphy" <lists@colorremedies.com>:
>> Scrub initiated, walked away, and when I come back it appears hung with a black screen unresponsive
>
> I suspect here is interplay between two issues.
> First is btrfs kernel bug Qu Wenruo is talking about.
> Second is systemd issue, which amplifies this kernel bug.
>
> Systemd bug turns simple "suspend doesn't work, but system continues to
> operate normally" to "reboot is needed".
>
> I wrote about this here: https://github.com/systemd/systemd/issues/38337 .
>
> The bug is fixed in mainline and stable versions of systemd.
Thanks for the response.
I've since moved to Fedora 43 (pre-release) which has systemd-258-1.fc43.x86_64.
Fedora 42 still has systemd-257.9-2.fc42 which is what I was running at the time of the problem.
--
Chris Murphy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-10-12 22:10 ` Chris Murphy
@ 2025-10-16 22:15 ` Chris Murphy
2025-10-17 11:01 ` Askar Safin
0 siblings, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2025-10-16 22:15 UTC (permalink / raw)
To: Askar Safin; +Cc: Btrfs BTRFS
On Sun, Oct 12, 2025, at 6:10 PM, Chris Murphy wrote:
> On Sun, Oct 12, 2025, at 4:52 AM, Askar Safin wrote:
>> "Chris Murphy" <lists@colorremedies.com>:
>>> Scrub initiated, walked away, and when I come back it appears hung with a black screen unresponsive
>>
>> I suspect here is interplay between two issues.
>> First is btrfs kernel bug Qu Wenruo is talking about.
>> Second is systemd issue, which amplifies this kernel bug.
>>
>> Systemd bug turns simple "suspend doesn't work, but system continues to
>> operate normally" to "reboot is needed".
>>
>> I wrote about this here: https://github.com/systemd/systemd/issues/38337 .
>>
>> The bug is fixed in mainline and stable versions of systemd.
>
> Thanks for the response.
>
> I've since moved to Fedora 43 (pre-release) which has
> systemd-258-1.fc43.x86_64.
>
> Fedora 42 still has systemd-257.9-2.fc42 which is what I was running at
> the time of the problem.
I'm told the fix is already in the systemd I had when Inran into the problem. Fix is in 257.8 and I had 257.9-2.
--
Chris Murphy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.17rc5: btrfs scrub, Freezing user space processes failed
2025-10-16 22:15 ` Chris Murphy
@ 2025-10-17 11:01 ` Askar Safin
0 siblings, 0 replies; 8+ messages in thread
From: Askar Safin @ 2025-10-17 11:01 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Fri, Oct 17, 2025 at 1:15 AM Chris Murphy <lists@colorremedies.com> wrote:
> I'm told the fix is already in the systemd I had when Inran into the problem. Fix is in 257.8 and I had 257.9-2.
Then systemd is not involved.
Anyway, a fix for original kernel bug is in progress.
I tested this patch and it works:
https://lore.kernel.org/linux-btrfs/5517a3cd-1afa-4db0-bf8b-439f3ba410ed@gmx.com/
--
Askar Safin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-17 11:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-13 21:57 6.17rc5: btrfs scrub, Freezing user space processes failed Chris Murphy
2025-09-15 23:16 ` Chris Murphy
2025-09-15 23:37 ` Qu Wenruo
2025-09-15 23:51 ` Chris Murphy
2025-10-12 8:52 ` Askar Safin
2025-10-12 22:10 ` Chris Murphy
2025-10-16 22:15 ` Chris Murphy
2025-10-17 11:01 ` Askar Safin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox