* 6.12 WARNING in netfs_consume_read_data()
@ 2024-11-18 23:04 Max Kellermann
2024-11-29 12:41 ` David Howells
0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-11-18 23:04 UTC (permalink / raw)
To: David Howells, Jeff Layton, netfs, linux-fsdevel, linux-kernel
Hi David & netfs developers,
I tried the new Linux kernel 6.12 today and it quickly reported a bug:
------------[ cut here ]------------
WARNING: CPU: 13 PID: 0 at kernel/softirq.c:361 __local_bh_enable_ip+0x37/0x50
Modules linked in:
CPU: 13 UID: 0 PID: 0 Comm: swapper/13 Not tainted 6.12.0-cm4all1-hp+ #236
Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS
U30 09/05/2019
RIP: 0010:__local_bh_enable_ip+0x37/0x50
Code: 00 0f 00 75 25 83 ee 01 f7 de 65 01 35 2a 93 d8 70 65 f7 05 1f
93 d8 70 00 ff ff 00 74 10 65 ff 0d 16 93 d8 70 c3 cc cc cc cc <0f> 0b
eb d7 65 66 83 3d 24 93 d8 70 00 74 e5 e8 45 ff ff ff eb de
RSP: 0018:ffff979300464d30 EFLAGS: 00010006
RAX: dead000000000122 RBX: ffff8c5cd1045b00 RCX: ffff8c5cdeff7800
RDX: ffff8c5d022f91f0 RSI: 0000000000000200 RDI: ffffffff8f5ee1cc
RBP: 0000000000000000 R08: 0000000000000002 R09: ffff8cba7cf6eb68
R10: ffffffff910621e0 R11: 0000000000000001 R12: ffff8c5d022f9368
R13: 0000000000001000 R14: ffff8c5d022f9368 R15: 0000000000001000
FS: 0000000000000000(0000) GS:ffff8cba7cf40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ea9986d368 CR3: 000000603ee2e005 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? __warn+0x81/0x110
? __local_bh_enable_ip+0x37/0x50
? report_bug+0x14c/0x170
? handle_bug+0x53/0x90
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? netfs_consume_read_data.isra.0+0x3fc/0xa70
? __local_bh_enable_ip+0x37/0x50
netfs_consume_read_data.isra.0+0x3fc/0xa70
? __pfx_cachefiles_read_complete+0x10/0x10
netfs_read_subreq_terminated+0x265/0x360
cachefiles_read_complete+0x45/0xf0
iomap_dio_bio_end_io+0x122/0x160
blk_update_request+0xf1/0x3e0
scsi_end_request+0x27/0x190
scsi_io_completion+0x43/0x6c0
pqi_irq_handler+0x108/0xcd0
__handle_irq_event_percpu+0x43/0x160
handle_irq_event+0x27/0x70
handle_edge_irq+0x82/0x220
__common_interrupt+0x37/0xb0
common_interrupt+0x74/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:cpuidle_enter_state+0xba/0x3c0
Code: 00 e8 7a 06 1d ff e8 45 f7 ff ff 8b 53 04 49 89 c5 0f 1f 44 00
00 31 ff e8 83 38 1c ff 45 84 ff 0f 85 16 01 00 00 fb 45 85 f6 <0f> 88
76 01 00 00 48 8b 04 24 49 63 ce 48 6b d1 68 49 29 c5 48 89
RSP: 0018:ffff97930015fe98 EFLAGS: 00000206
RAX: ffff8cba7cf40000 RBX: ffffb792fe94f250 RCX: 000000000000001f
RDX: 000000000000000d RSI: 0000000037c86db9 RDI: 0000000000000000
RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000020
R10: 0000000000000003 R11: 0000000000000015 R12: ffffffff9125b220
R13: 000006c7b80e3e03 R14: 0000000000000003 R15: 0000000000000000
? cpuidle_enter_state+0xad/0x3c0
cpuidle_enter+0x29/0x40
do_idle+0x19f/0x200
cpu_startup_entry+0x25/0x30
start_secondary+0xf3/0x100
common_startup_64+0x13e/0x148
</TASK>
---[ end trace 0000000000000000 ]---
Apparently, the netfs code doesn't want to be called from hardirq
context, but the cachefiles read completion callback may run in a
hardirq handler.
This source file is new in 6.12, so this regression may have been
caused by David Howell's commit ee4cdf7ba857 ("netfs: Speed up
buffered reading").
Max
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-11-18 23:04 6.12 WARNING in netfs_consume_read_data() Max Kellermann
@ 2024-11-29 12:41 ` David Howells
2024-11-29 21:06 ` Max Kellermann
2024-11-29 22:29 ` Max Kellermann
0 siblings, 2 replies; 14+ messages in thread
From: David Howells @ 2024-11-29 12:41 UTC (permalink / raw)
To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel
Hi Max,
Could you try:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback
I think the patches there should fix it.
However, as these aren't making it till the next merge window, I'll try and
work out an interim fix.
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-11-29 12:41 ` David Howells
@ 2024-11-29 21:06 ` Max Kellermann
2024-11-29 22:29 ` Max Kellermann
1 sibling, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2024-11-29 21:06 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Fri, Nov 29, 2024 at 1:41 PM David Howells <dhowells@redhat.com> wrote:
> Could you try:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback
>
> I think the patches there should fix it.
Confirmed. I have merged this branch on top of 6.12.1, and the problem
no longer occurs. Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-11-29 12:41 ` David Howells
2024-11-29 21:06 ` Max Kellermann
@ 2024-11-29 22:29 ` Max Kellermann
2024-12-06 15:08 ` Max Kellermann
1 sibling, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-11-29 22:29 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Fri, Nov 29, 2024 at 1:41 PM David Howells <dhowells@redhat.com> wrote:
> Could you try:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback
>
> I think the patches there should fix it.
I have encountered multiple hangs with that branch; for example:
[<0>] folio_wait_private_2_killable+0xfb/0x180
[<0>] netfs_write_begin+0xe0/0x450
[<0>] ceph_write_begin+0x27/0x50
[<0>] generic_perform_write+0xcd/0x280
[<0>] ceph_write_iter+0x4d9/0x640
[<0>] iter_file_splice_write+0x308/0x550
[<0>] splice_file_range_actor+0x29/0x40
[<0>] splice_direct_to_actor+0xee/0x270
[<0>] splice_file_range+0x80/0xc0
[<0>] ceph_copy_file_range+0xb5/0x5b0
[<0>] vfs_copy_file_range+0x320/0x5b0
[<0>] __x64_sys_copy_file_range+0xef/0x200
[<0>] do_syscall_64+0x64/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
I can't be 100% sure if this was caused by your branch or if this is
just another 6.12 regression, but I haven't seen this with 6.11.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-11-29 22:29 ` Max Kellermann
@ 2024-12-06 15:08 ` Max Kellermann
2024-12-06 15:13 ` Max Kellermann
2024-12-06 16:29 ` Max Kellermann
0 siblings, 2 replies; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 15:08 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Fri, Nov 29, 2024 at 11:29 PM Max Kellermann
<max.kellermann@ionos.com> wrote:
> I have encountered multiple hangs with that branch; for example:
Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):
[<0>] folio_wait_bit_common+0x23a/0x4f0
[<0>] folio_wait_private_2+0x37/0x70
[<0>] netfs_invalidate_folio+0x168/0x520
[<0>] truncate_cleanup_folio+0x281/0x340
[<0>] truncate_inode_pages_range+0x1bb/0x780
[<0>] ceph_evict_inode+0x17e/0x6b0
[<0>] evict+0x331/0x780
[<0>] __dentry_kill+0x17b/0x4f0
[<0>] dput+0x2a6/0x4a0
[<0>] __fput+0x36d/0x910
[<0>] __x64_sys_close+0x78/0xd0
[<0>] do_syscall_64+0x64/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
netfs in 6.12 remains pretty broken, both with and without your fixes.
The sad thing is that 6.11 has just been EOLed. Linux 6.6 is the only
sane kernel.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-12-06 15:08 ` Max Kellermann
@ 2024-12-06 15:13 ` Max Kellermann
2024-12-06 16:29 ` Max Kellermann
1 sibling, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 15:13 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Fri, Dec 6, 2024 at 4:08 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):
>
> [<0>] folio_wait_bit_common+0x23a/0x4f0
> [<0>] folio_wait_private_2+0x37/0x70
> [<0>] netfs_invalidate_folio+0x168/0x520
> [<0>] truncate_cleanup_folio+0x281/0x340
> [<0>] truncate_inode_pages_range+0x1bb/0x780
> [<0>] ceph_evict_inode+0x17e/0x6b0
> [<0>] evict+0x331/0x780
> [<0>] __dentry_kill+0x17b/0x4f0
> [<0>] dput+0x2a6/0x4a0
> [<0>] __fput+0x36d/0x910
> [<0>] __x64_sys_close+0x78/0xd0
> [<0>] do_syscall_64+0x64/0x100
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
This might help you understand the problem:
INFO: task cp:3345 blocked for more than 122 seconds.
Not tainted 6.12.3-cm4all0-hp+ #297
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:cp state:D stack:0 pid:3345 tgid:3345
ppid:3331 flags:0x00000002
Call Trace:
<TASK>
__schedule+0xc34/0x4df0
? __pfx___schedule+0x10/0x10
? lock_release+0x206/0x660
? schedule+0x283/0x340
? __pfx_lock_release+0x10/0x10
? schedule+0x1e8/0x340
schedule+0xdc/0x340
io_schedule+0xc0/0x130
folio_wait_bit_common+0x23a/0x4f0
? __pfx_folio_wait_bit_common+0x10/0x10
? __pfx_wake_page_function+0x10/0x10
? __pfx_truncate_folio_batch_exceptionals.part.0+0x10/0x10
folio_wait_private_2+0x37/0x70
netfs_invalidate_folio+0x168/0x520
? ceph_invalidate_folio+0x114/0x2a0
truncate_cleanup_folio+0x281/0x340
truncate_inode_pages_range+0x1bb/0x780
? __pfx_truncate_inode_pages_range+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
? find_held_lock+0x2d/0x110
? do_raw_spin_unlock+0x54/0x220
ceph_evict_inode+0x17e/0x6b0
? lock_release+0x206/0x660
? __pfx_ceph_evict_inode+0x10/0x10
? __pfx_lock_release+0x10/0x10
? do_raw_spin_lock+0x12d/0x270
? __pfx_do_raw_spin_lock+0x10/0x10
evict+0x331/0x780
? __pfx_evict+0x10/0x10
? do_raw_spin_unlock+0x54/0x220
? _raw_spin_unlock+0x1f/0x30
? iput.part.0+0x3d0/0x670
__dentry_kill+0x17b/0x4f0
dput+0x2a6/0x4a0
__fput+0x36d/0x910
__x64_sys_close+0x78/0xd0
do_syscall_64+0x64/0x100
? syscall_exit_to_user_mode+0x57/0x120
? do_syscall_64+0x70/0x100
? do_raw_spin_unlock+0x54/0x220
? _raw_spin_unlock+0x1f/0x30
? generic_fadvise+0x210/0x590
? __pfx_generic_fadvise+0x10/0x10
? syscall_exit_to_user_mode+0x57/0x120
? __pfx___seccomp_filter+0x10/0x10
? fdget+0x53/0x430
? __pfx_do_sys_openat2+0x10/0x10
? __x64_sys_fadvise64+0x139/0x180
? syscall_exit_to_user_mode+0x57/0x120
? do_syscall_64+0x70/0x100
? __x64_sys_openat+0x135/0x1d0
? __pfx___x64_sys_openat+0x10/0x10
? syscall_exit_to_user_mode+0x57/0x120
? do_syscall_64+0x70/0x100
? irqentry_exit_to_user_mode+0x3d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f66375bd960
RSP: 002b:00007ffd5bcd65a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f66375bd960
RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffd5bcd6990 R08: 7fffffffc0000000 R09: 0000000000000000
R10: 00007f66374dc498 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000001
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/163:
#0: ffffffffb5829b80 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x64/0x280
2 locks held by kworker/14:1/476:
#0: ffff88815cb2b548 ((wq_completion)ceph-cap){....}-{0:0}, at:
process_one_work+0xdea/0x14f0
#1: ffff88810ca97da0
((work_completion)(&mdsc->cap_reclaim_work)){....}-{0:0}, at:
process_one_work+0x743/0x14f0
=============================================
INFO: task kworker/14:1:476 blocked for more than 122 seconds.
Not tainted 6.12.3-cm4all0-hp+ #297
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/14:1 state:D stack:0 pid:476 tgid:476 ppid:2
flags:0x00004000
Workqueue: ceph-cap ceph_cap_reclaim_work
Call Trace:
<TASK>
__schedule+0xc34/0x4df0
? __pfx___schedule+0x10/0x10
? lock_release+0x206/0x660
? schedule+0x283/0x340
? __pfx_lock_release+0x10/0x10
? kick_pool+0x1da/0x530
schedule+0xdc/0x340
io_schedule+0xc0/0x130
folio_wait_bit_common+0x23a/0x4f0
? __pfx_folio_wait_bit_common+0x10/0x10
? __pfx_wake_page_function+0x10/0x10
? __pfx_truncate_folio_batch_exceptionals.part.0+0x10/0x10
folio_wait_private_2+0x37/0x70
netfs_invalidate_folio+0x168/0x520
? ceph_invalidate_folio+0x114/0x2a0
truncate_cleanup_folio+0x281/0x340
truncate_inode_pages_range+0x1bb/0x780
? __pfx_truncate_inode_pages_range+0x10/0x10
? __lock_acquire.constprop.0+0x598/0x13e0
? release_sock+0x1b/0x180
? reacquire_held_locks+0x1e9/0x460
? release_sock+0x1b/0x180
? find_held_lock+0x2d/0x110
? lock_release+0x206/0x660
? truncate_inode_pages_final+0x59/0x80
? __pfx_lock_release+0x10/0x10
? do_raw_spin_lock+0x12d/0x270
? __pfx_do_raw_spin_lock+0x10/0x10
? find_held_lock+0x2d/0x110
? do_raw_spin_unlock+0x54/0x220
ceph_evict_inode+0x17e/0x6b0
? lock_release+0x206/0x660
? __pfx_ceph_evict_inode+0x10/0x10
? __pfx_lock_release+0x10/0x10
? lock_is_held_type+0xdb/0x110
evict+0x331/0x780
? __pfx_evict+0x10/0x10
? do_raw_spin_unlock+0x54/0x220
? _raw_spin_unlock+0x1f/0x30
? iput.part.0+0x3d0/0x670
__dentry_kill+0x17b/0x4f0
dput+0x2a6/0x4a0
__dentry_leases_walk+0x6c6/0x10d0
? do_raw_spin_lock+0x12d/0x270
? __pfx___dentry_leases_walk+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
ceph_trim_dentries+0x1b1/0x260
? __pfx_ceph_trim_dentries+0x10/0x10
? lock_acquire+0x11f/0x290
? process_one_work+0x743/0x14f0
ceph_cap_reclaim_work+0x19/0xc0
process_one_work+0x7b4/0x14f0
? __pfx_process_one_work+0x10/0x10
? assign_work+0x16c/0x240
? lock_is_held_type+0x9a/0x110
worker_thread+0x52b/0xe40
? do_raw_spin_unlock+0x54/0x220
? __kthread_parkme+0x95/0x120
? __pfx_worker_thread+0x10/0x10
kthread+0x28a/0x350
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2d/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-12-06 15:08 ` Max Kellermann
2024-12-06 15:13 ` Max Kellermann
@ 2024-12-06 16:29 ` Max Kellermann
2024-12-06 21:13 ` David Howells
1 sibling, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 16:29 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Fri, Dec 6, 2024 at 4:08 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):
(Correction: this was 6.12.3, not 6.12.2)
I tried with 6.12.3 + dhowells/netfs-writeback; David's branch solved
many problems and it took much longer to trigger the hang, but it
eventually occurred:
INFO: task bash:6599 blocked for more than 122 seconds.
Not tainted 6.12.3-cm4all0-hp+ #298
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:bash state:D stack:0 pid:6599 tgid:6599
ppid:6598 flags:0x00000006
Call Trace:
<TASK>
__schedule+0xc34/0x4df0
? is_dynamic_key+0x120/0x150
? __pfx___schedule+0x10/0x10
? lock_release+0x206/0x660
? schedule+0x283/0x340
? __pfx_lock_release+0x10/0x10
? schedule+0x1e8/0x340
schedule+0xdc/0x340
schedule_preempt_disabled+0xa/0x10
rwsem_down_read_slowpath+0x6ba/0xd00
? __pfx_rwsem_down_read_slowpath+0x10/0x10
? kernel_text_address+0xb8/0x150
? lock_acquire+0x11f/0x290
? ceph_start_io_read+0x19/0x80
down_read+0xcd/0x220
? __pfx_down_read+0x10/0x10
? do_sys_openat2+0x106/0x160
? stack_trace_save+0x96/0xd0
? __pfx_stack_trace_save+0x10/0x10
ceph_start_io_read+0x19/0x80
ceph_read_iter+0x2e2/0xe70
? __pfx_ceph_read_iter+0x10/0x10
? psi_task_switch+0x256/0x810
? find_held_lock+0x2d/0x110
? lock_release+0x206/0x660
? finish_task_switch.isra.0+0x1db/0xa40
vfs_read+0x6e1/0xc40
? lock_acquire+0x11f/0x290
? finish_task_switch.isra.0+0x129/0xa40
? __pfx_vfs_read+0x10/0x10
? finish_task_switch.isra.0+0x225/0xa40
? fdget_pos+0x1b3/0x540
? __pfx___seccomp_filter+0x10/0x10
ksys_read+0xee/0x1c0
? __pfx_ksys_read+0x10/0x10
? lock_release+0x206/0x660
? __ceph_do_getattr+0xe8/0x380
do_syscall_64+0x64/0x100
? fdget_raw+0x53/0x390
? __do_sys_newfstatat+0x86/0xd0
? __pfx___do_sys_newfstatat+0x10/0x10
? syscall_exit_to_user_mode+0x57/0x120
? do_syscall_64+0x70/0x100
? irqentry_exit_to_user_mode+0x3d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f71fb04f21d
RSP: 002b:00007ffdd516a918 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f71fb04f21d
RDX: 0000000000003003 RSI: 00005597dc55d4d0 RDI: 0000000000000003
RBP: 0000000000003003 R08: 00007f71fb12a020 R09: 00007f71fb12a020
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
R13: 00005597dc485340 R14: 00005597dc55d4d0 R15: 00005597be8d7524
</TASK>
INFO: task bash:6614 blocked for more than 122 seconds.
Not tainted 6.12.3-cm4all0-hp+ #298
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:bash state:D stack:0 pid:6614 tgid:6614
ppid:6613 flags:0x00000002
Call Trace:
<TASK>
__schedule+0xc34/0x4df0
? __free_insn_slot+0x370/0x3d0
? __pfx___schedule+0x10/0x10
? lock_release+0x206/0x660
? schedule+0x283/0x340
? __pfx_lock_release+0x10/0x10
? schedule+0x1e8/0x340
schedule+0xdc/0x340
schedule_preempt_disabled+0xa/0x10
rwsem_down_read_slowpath+0x6ba/0xd00
? __pfx_rwsem_down_read_slowpath+0x10/0x10
? kasan_save_stack+0x1c/0x40
? kasan_save_track+0x10/0x30
? lock_acquire+0x11f/0x290
? ceph_start_io_read+0x19/0x80
? find_held_lock+0x2d/0x110
down_read+0xcd/0x220
? __ceph_caps_issued_mask+0x416/0xa10
? __pfx_down_read+0x10/0x10
ceph_start_io_read+0x19/0x80
ceph_read_iter+0x2e2/0xe70
? _copy_to_user+0x50/0x70
? __pfx_ceph_read_iter+0x10/0x10
? fdget_raw+0x53/0x390
vfs_read+0x6e1/0xc40
? __do_sys_newfstatat+0x86/0xd0
? __pfx___do_sys_newfstatat+0x10/0x10
? __pfx_vfs_read+0x10/0x10
? fdget_pos+0x1b3/0x540
? __pfx___seccomp_filter+0x10/0x10
ksys_read+0xee/0x1c0
? __pfx_ksys_read+0x10/0x10
do_syscall_64+0x64/0x100
? do_user_addr_fault+0x401/0x8f0
? find_held_lock+0x59/0x110
? find_held_lock+0x2d/0x110
? lock_release+0x206/0x660
? do_user_addr_fault+0x45e/0x8f0
? __pfx_lock_release+0x10/0x10
? do_user_addr_fault+0x401/0x8f0
? do_user_addr_fault+0x463/0x8f0
? irqentry_exit_to_user_mode+0x3d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa84a43a21d
RSP: 002b:00007ffec8720278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa84a43a21d
RDX: 0000000000003003 RSI: 0000565007f224d0 RDI: 0000000000000003
RBP: 0000000000003003 R08: 00007fa84a515020 R09: 00007fa84a515020
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
R13: 0000565007e4a340 R14: 0000565007f224d0 R15: 0000565005ba8524
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/163:
#0: ffffffffae629b80 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x64/0x280
2 locks held by bash/3365:
#0: ffff8881661803e0 (sb_writers#19){....}-{0:0}, at: ksys_write+0xee/0x1c0
#1: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_write+0x15/0x30
1 lock held by bash/6599:
#0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80
1 lock held by bash/6614:
#0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80
=============================================
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-12-06 16:29 ` Max Kellermann
@ 2024-12-06 21:13 ` David Howells
2024-12-07 18:39 ` Max Kellermann
0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-06 21:13 UTC (permalink / raw)
To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel
Max Kellermann <max.kellermann@ionos.com> wrote:
>
> I tried with 6.12.3 + dhowells/netfs-writeback; David's branch solved
> many problems and it took much longer to trigger the hang, but it
> eventually occurred:
Can you get some tracing for me?
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_folio/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_read/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_write/enable
(And compress the result before posting)
Thanks,
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-12-06 21:13 ` David Howells
@ 2024-12-07 18:39 ` Max Kellermann
2024-12-08 10:20 ` Max Kellermann
0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-07 18:39 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 997 bytes --]
On Fri, Dec 6, 2024 at 10:13 PM David Howells <dhowells@redhat.com> wrote:
> Can you get some tracing for me?
Here's a Brotli-compressed trace (of 6.12.3 +
dhowells/netfs-writeback). I can reproduce it this way:
1. log in via SSH (cm4all-lukko is our SSH server)
2. "cp" a file on the Ceph home directory
3. log out - bash hangs inside write():
[<0>] folio_wait_bit_common+0x23a/0x4f0
[<0>] folio_wait_private_2_killable+0x3b/0x80
[<0>] netfs_write_begin+0x5a5/0xb90
[<0>] ceph_write_begin+0x6f/0x150
[<0>] generic_perform_write+0x253/0x7d0
[<0>] ceph_write_iter+0xafd/0x1010
[<0>] vfs_write+0x5bc/0x1030
[<0>] ksys_write+0xee/0x1c0
[<0>] do_syscall_64+0x64/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
bash tries to write the history file when logging out and that hangs.
This apparently always happens after somebody does copy_file_range on
Ceph - "cp" in my example.
Triggering this is trivial for me, so if you need any more data, I can get it.
Max
[-- Attachment #2: trace.br --]
[-- Type: application/octet-stream, Size: 178978 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 6.12 WARNING in netfs_consume_read_data()
2024-12-07 18:39 ` Max Kellermann
@ 2024-12-08 10:20 ` Max Kellermann
2024-12-09 13:14 ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-08 10:20 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Sat, Dec 7, 2024 at 7:39 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Here's a Brotli-compressed trace (of 6.12.3 +
> dhowells/netfs-writeback). I can reproduce it this way:
>
> 1. log in via SSH (cm4all-lukko is our SSH server)
> 2. "cp" a file on the Ceph home directory
> 3. log out - bash hangs inside write():
Today, I took the time to bisect it, and the bisect arrived at
first-bad commit 796a4049640b54cb1daf9e7fe543292c5ca02c74 ("netfs: In
readahead, put the folio refs as soon extracted").
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] netfs: Fix ceph copy to cache on write-begin
2024-12-08 10:20 ` Max Kellermann
@ 2024-12-09 13:14 ` David Howells
2024-12-09 13:31 ` Max Kellermann
0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-09 13:14 UTC (permalink / raw)
To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel
Hi Max,
Could you try this?
David
---
netfs: Fix ceph copy to cache on write-begin
At the end of netfs_unlock_read_folio() in which folios are marked
appropriately for copying to the cache (either with by being marked dirty
and having their private data set or by having PG_private_2 set) and then
unlocked, the folio_queue struct has the entry pointing to the folio
cleared. This presents a problem for netfs_pgpriv2_write_to_the_cache(),
which is used to write folios marked with PG_private_2 to the cache as it
expects to be able to trawl the folio_queue list thereafter to find the
relevant folios, leading to a hang.
Fix this by not clearing the folio_queue entry if we're going to do the
deprecated copy-to-cache. The clearance will be done instead as the folios
are written to the cache.
This can be reproduced by starting cachefiles, mounting a ceph filesystem
with "-o fsc" and writing to it.
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/netfs/read_collect.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 849f40f64443..72a16222b63b 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -62,10 +62,14 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq,
} else {
trace_netfs_folio(folio, netfs_folio_trace_read_done);
}
+
+ folioq_clear(folioq, slot);
} else {
// TODO: Use of PG_private_2 is deprecated.
if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags))
netfs_pgpriv2_mark_copy_to_cache(subreq, rreq, folioq, slot);
+ else
+ folioq_clear(folioq, slot);
}
if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
@@ -77,8 +81,6 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq,
folio_unlock(folio);
}
}
-
- folioq_clear(folioq, slot);
}
/*
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
2024-12-09 13:14 ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
@ 2024-12-09 13:31 ` Max Kellermann
2024-12-12 16:00 ` David Howells
0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-09 13:31 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Mon, Dec 9, 2024 at 2:14 PM David Howells <dhowells@redhat.com> wrote:
> Could you try this?
No change, still hangs immediately on the first try.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
2024-12-09 13:31 ` Max Kellermann
@ 2024-12-12 16:00 ` David Howells
2025-01-07 10:58 ` Max Kellermann
0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-12 16:00 UTC (permalink / raw)
To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel
How about if you add the attached?
For convenience, I've put the outstanding fix patches I have here:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
David
---
commit d0bc2ecca996105f55da22e8867905ca1dad7c8f
Author: David Howells <dhowells@redhat.com>
Date: Thu Dec 12 15:26:24 2024 +0000
netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
When the caching for a cookie is temporarily disabled (e.g. due to a DIO
write on that file), future copying to the cache for that file is disabled
until all fds open on that file are closed. However, if netfslib is using
the deprecated PG_private_2 method (such as is currently used by ceph), and
decides it wants to copy to the cache, netfs_advance_write() will just bail
at the first check seeing that the cache stream is unavailable, and
indicate that it dealt with all the content.
This means that we have no subrequests to provide notifications to drive
the state machine or even to pin the request and the request just gets
discarded, leaving the folios with PG_private_2 set.
Fix this by jumping directly to cancel the request if the cache is not
available. That way, we don't remove mark3 from the folio_queue list and
netfs_pgpriv2_cancel() will clean up the folios.
This was found by running the generic/013 xfstest against ceph with an active
cache and the "-o fsc" option passed to ceph. That would usually hang
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index ba5af89d37fa..54d5004fec18 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -170,6 +170,10 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
+ if (!wreq->io_streams[1].avail) {
+ netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+ goto couldnt_start;
+ }
for (;;) {
error = netfs_pgpriv2_copy_folio(wreq, folio);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
2024-12-12 16:00 ` David Howells
@ 2025-01-07 10:58 ` Max Kellermann
0 siblings, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2025-01-07 10:58 UTC (permalink / raw)
To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel
On Thu, Dec 12, 2024 at 5:00 PM David Howells <dhowells@redhat.com> wrote:
> How about if you add the attached?
Still no change, first try led to hang, as always.
(Sorry for the late reply, I was offline on vacation)
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-07 10:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 23:04 6.12 WARNING in netfs_consume_read_data() Max Kellermann
2024-11-29 12:41 ` David Howells
2024-11-29 21:06 ` Max Kellermann
2024-11-29 22:29 ` Max Kellermann
2024-12-06 15:08 ` Max Kellermann
2024-12-06 15:13 ` Max Kellermann
2024-12-06 16:29 ` Max Kellermann
2024-12-06 21:13 ` David Howells
2024-12-07 18:39 ` Max Kellermann
2024-12-08 10:20 ` Max Kellermann
2024-12-09 13:14 ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
2024-12-09 13:31 ` Max Kellermann
2024-12-12 16:00 ` David Howells
2025-01-07 10:58 ` Max Kellermann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox