public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* 6.12 WARNING in netfs_consume_read_data()
@ 2024-11-18 23:04 Max Kellermann
  2024-11-29 12:41 ` David Howells
  0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-11-18 23:04 UTC (permalink / raw)
  To: David Howells, Jeff Layton, netfs, linux-fsdevel, linux-kernel

Hi David & netfs developers,

I tried the new Linux kernel 6.12 today and it quickly reported a bug:

 ------------[ cut here ]------------
 WARNING: CPU: 13 PID: 0 at kernel/softirq.c:361 __local_bh_enable_ip+0x37/0x50
 Modules linked in:
 CPU: 13 UID: 0 PID: 0 Comm: swapper/13 Not tainted 6.12.0-cm4all1-hp+ #236
 Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS
U30 09/05/2019
 RIP: 0010:__local_bh_enable_ip+0x37/0x50
 Code: 00 0f 00 75 25 83 ee 01 f7 de 65 01 35 2a 93 d8 70 65 f7 05 1f
93 d8 70 00 ff ff 00 74 10 65 ff 0d 16 93 d8 70 c3 cc cc cc cc <0f> 0b
eb d7 65 66 83 3d 24 93 d8 70 00 74 e5 e8 45 ff ff ff eb de
 RSP: 0018:ffff979300464d30 EFLAGS: 00010006
 RAX: dead000000000122 RBX: ffff8c5cd1045b00 RCX: ffff8c5cdeff7800
 RDX: ffff8c5d022f91f0 RSI: 0000000000000200 RDI: ffffffff8f5ee1cc
 RBP: 0000000000000000 R08: 0000000000000002 R09: ffff8cba7cf6eb68
 R10: ffffffff910621e0 R11: 0000000000000001 R12: ffff8c5d022f9368
 R13: 0000000000001000 R14: ffff8c5d022f9368 R15: 0000000000001000
 FS:  0000000000000000(0000) GS:ffff8cba7cf40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055ea9986d368 CR3: 000000603ee2e005 CR4: 00000000007706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <IRQ>
  ? __warn+0x81/0x110
  ? __local_bh_enable_ip+0x37/0x50
  ? report_bug+0x14c/0x170
  ? handle_bug+0x53/0x90
  ? exc_invalid_op+0x13/0x60
  ? asm_exc_invalid_op+0x16/0x20
  ? netfs_consume_read_data.isra.0+0x3fc/0xa70
  ? __local_bh_enable_ip+0x37/0x50
  netfs_consume_read_data.isra.0+0x3fc/0xa70
  ? __pfx_cachefiles_read_complete+0x10/0x10
  netfs_read_subreq_terminated+0x265/0x360
  cachefiles_read_complete+0x45/0xf0
  iomap_dio_bio_end_io+0x122/0x160
  blk_update_request+0xf1/0x3e0
  scsi_end_request+0x27/0x190
  scsi_io_completion+0x43/0x6c0
  pqi_irq_handler+0x108/0xcd0
  __handle_irq_event_percpu+0x43/0x160
  handle_irq_event+0x27/0x70
  handle_edge_irq+0x82/0x220
  __common_interrupt+0x37/0xb0
  common_interrupt+0x74/0xa0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x22/0x40
 RIP: 0010:cpuidle_enter_state+0xba/0x3c0
 Code: 00 e8 7a 06 1d ff e8 45 f7 ff ff 8b 53 04 49 89 c5 0f 1f 44 00
00 31 ff e8 83 38 1c ff 45 84 ff 0f 85 16 01 00 00 fb 45 85 f6 <0f> 88
76 01 00 00 48 8b 04 24 49 63 ce 48 6b d1 68 49 29 c5 48 89
 RSP: 0018:ffff97930015fe98 EFLAGS: 00000206
 RAX: ffff8cba7cf40000 RBX: ffffb792fe94f250 RCX: 000000000000001f
 RDX: 000000000000000d RSI: 0000000037c86db9 RDI: 0000000000000000
 RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000020
 R10: 0000000000000003 R11: 0000000000000015 R12: ffffffff9125b220
 R13: 000006c7b80e3e03 R14: 0000000000000003 R15: 0000000000000000
  ? cpuidle_enter_state+0xad/0x3c0
  cpuidle_enter+0x29/0x40
  do_idle+0x19f/0x200
  cpu_startup_entry+0x25/0x30
  start_secondary+0xf3/0x100
  common_startup_64+0x13e/0x148
  </TASK>
 ---[ end trace 0000000000000000 ]---

Apparently, the netfs code doesn't want to be called from hardirq
context, but the cachefiles read completion callback may run in a
hardirq handler.

This source file is new in 6.12, so this regression may have been
caused by David Howell's commit ee4cdf7ba857 ("netfs: Speed up
buffered reading").

Max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-11-18 23:04 6.12 WARNING in netfs_consume_read_data() Max Kellermann
@ 2024-11-29 12:41 ` David Howells
  2024-11-29 21:06   ` Max Kellermann
  2024-11-29 22:29   ` Max Kellermann
  0 siblings, 2 replies; 14+ messages in thread
From: David Howells @ 2024-11-29 12:41 UTC (permalink / raw)
  To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel

Hi Max,

Could you try:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback

I think the patches there should fix it.

However, as these aren't making it till the next merge window, I'll try and
work out an interim fix.

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-11-29 12:41 ` David Howells
@ 2024-11-29 21:06   ` Max Kellermann
  2024-11-29 22:29   ` Max Kellermann
  1 sibling, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2024-11-29 21:06 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Fri, Nov 29, 2024 at 1:41 PM David Howells <dhowells@redhat.com> wrote:
> Could you try:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback
>
> I think the patches there should fix it.

Confirmed. I have merged this branch on top of 6.12.1, and the problem
no longer occurs. Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-11-29 12:41 ` David Howells
  2024-11-29 21:06   ` Max Kellermann
@ 2024-11-29 22:29   ` Max Kellermann
  2024-12-06 15:08     ` Max Kellermann
  1 sibling, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-11-29 22:29 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Fri, Nov 29, 2024 at 1:41 PM David Howells <dhowells@redhat.com> wrote:
> Could you try:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-writeback
>
> I think the patches there should fix it.

I have encountered multiple hangs with that branch; for example:

 [<0>] folio_wait_private_2_killable+0xfb/0x180
 [<0>] netfs_write_begin+0xe0/0x450
 [<0>] ceph_write_begin+0x27/0x50
 [<0>] generic_perform_write+0xcd/0x280
 [<0>] ceph_write_iter+0x4d9/0x640
 [<0>] iter_file_splice_write+0x308/0x550
 [<0>] splice_file_range_actor+0x29/0x40
 [<0>] splice_direct_to_actor+0xee/0x270
 [<0>] splice_file_range+0x80/0xc0
 [<0>] ceph_copy_file_range+0xb5/0x5b0
 [<0>] vfs_copy_file_range+0x320/0x5b0
 [<0>] __x64_sys_copy_file_range+0xef/0x200
 [<0>] do_syscall_64+0x64/0x100
 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

I can't be 100% sure if this was caused by your branch or if this is
just another 6.12 regression, but I haven't seen this with 6.11.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-11-29 22:29   ` Max Kellermann
@ 2024-12-06 15:08     ` Max Kellermann
  2024-12-06 15:13       ` Max Kellermann
  2024-12-06 16:29       ` Max Kellermann
  0 siblings, 2 replies; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 15:08 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Fri, Nov 29, 2024 at 11:29 PM Max Kellermann
<max.kellermann@ionos.com> wrote:
> I have encountered multiple hangs with that branch; for example:

Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):

[<0>] folio_wait_bit_common+0x23a/0x4f0
[<0>] folio_wait_private_2+0x37/0x70
[<0>] netfs_invalidate_folio+0x168/0x520
[<0>] truncate_cleanup_folio+0x281/0x340
[<0>] truncate_inode_pages_range+0x1bb/0x780
[<0>] ceph_evict_inode+0x17e/0x6b0
[<0>] evict+0x331/0x780
[<0>] __dentry_kill+0x17b/0x4f0
[<0>] dput+0x2a6/0x4a0
[<0>] __fput+0x36d/0x910
[<0>] __x64_sys_close+0x78/0xd0
[<0>] do_syscall_64+0x64/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

netfs in 6.12 remains pretty broken, both with and without your fixes.

The sad thing is that 6.11 has just been EOLed. Linux 6.6 is the only
sane kernel.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-12-06 15:08     ` Max Kellermann
@ 2024-12-06 15:13       ` Max Kellermann
  2024-12-06 16:29       ` Max Kellermann
  1 sibling, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 15:13 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Fri, Dec 6, 2024 at 4:08 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):
>
> [<0>] folio_wait_bit_common+0x23a/0x4f0
> [<0>] folio_wait_private_2+0x37/0x70
> [<0>] netfs_invalidate_folio+0x168/0x520
> [<0>] truncate_cleanup_folio+0x281/0x340
> [<0>] truncate_inode_pages_range+0x1bb/0x780
> [<0>] ceph_evict_inode+0x17e/0x6b0
> [<0>] evict+0x331/0x780
> [<0>] __dentry_kill+0x17b/0x4f0
> [<0>] dput+0x2a6/0x4a0
> [<0>] __fput+0x36d/0x910
> [<0>] __x64_sys_close+0x78/0xd0
> [<0>] do_syscall_64+0x64/0x100
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This might help you understand the problem:

 INFO: task cp:3345 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #297
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:cp              state:D stack:0     pid:3345  tgid:3345
ppid:3331   flags:0x00000002
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? schedule+0x1e8/0x340
  schedule+0xdc/0x340
  io_schedule+0xc0/0x130
  folio_wait_bit_common+0x23a/0x4f0
  ? __pfx_folio_wait_bit_common+0x10/0x10
  ? __pfx_wake_page_function+0x10/0x10
  ? __pfx_truncate_folio_batch_exceptionals.part.0+0x10/0x10
  folio_wait_private_2+0x37/0x70
  netfs_invalidate_folio+0x168/0x520
  ? ceph_invalidate_folio+0x114/0x2a0
  truncate_cleanup_folio+0x281/0x340
  truncate_inode_pages_range+0x1bb/0x780
  ? __pfx_truncate_inode_pages_range+0x10/0x10
  ? __pfx_do_raw_spin_lock+0x10/0x10
  ? find_held_lock+0x2d/0x110
  ? do_raw_spin_unlock+0x54/0x220
  ceph_evict_inode+0x17e/0x6b0
  ? lock_release+0x206/0x660
  ? __pfx_ceph_evict_inode+0x10/0x10
  ? __pfx_lock_release+0x10/0x10
  ? do_raw_spin_lock+0x12d/0x270
  ? __pfx_do_raw_spin_lock+0x10/0x10
  evict+0x331/0x780
  ? __pfx_evict+0x10/0x10
  ? do_raw_spin_unlock+0x54/0x220
  ? _raw_spin_unlock+0x1f/0x30
  ? iput.part.0+0x3d0/0x670
  __dentry_kill+0x17b/0x4f0
  dput+0x2a6/0x4a0
  __fput+0x36d/0x910
  __x64_sys_close+0x78/0xd0
  do_syscall_64+0x64/0x100
  ? syscall_exit_to_user_mode+0x57/0x120
  ? do_syscall_64+0x70/0x100
  ? do_raw_spin_unlock+0x54/0x220
  ? _raw_spin_unlock+0x1f/0x30
  ? generic_fadvise+0x210/0x590
  ? __pfx_generic_fadvise+0x10/0x10
  ? syscall_exit_to_user_mode+0x57/0x120
  ? __pfx___seccomp_filter+0x10/0x10
  ? fdget+0x53/0x430
  ? __pfx_do_sys_openat2+0x10/0x10
  ? __x64_sys_fadvise64+0x139/0x180
  ? syscall_exit_to_user_mode+0x57/0x120
  ? do_syscall_64+0x70/0x100
  ? __x64_sys_openat+0x135/0x1d0
  ? __pfx___x64_sys_openat+0x10/0x10
  ? syscall_exit_to_user_mode+0x57/0x120
  ? do_syscall_64+0x70/0x100
  ? irqentry_exit_to_user_mode+0x3d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f66375bd960
 RSP: 002b:00007ffd5bcd65a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f66375bd960
 RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000003
 RBP: 00007ffd5bcd6990 R08: 7fffffffc0000000 R09: 0000000000000000
 R10: 00007f66374dc498 R11: 0000000000000202 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000001
  </TASK>

 Showing all locks held in the system:
 1 lock held by khungtaskd/163:
  #0: ffffffffb5829b80 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x64/0x280
 2 locks held by kworker/14:1/476:
  #0: ffff88815cb2b548 ((wq_completion)ceph-cap){....}-{0:0}, at:
process_one_work+0xdea/0x14f0
  #1: ffff88810ca97da0
((work_completion)(&mdsc->cap_reclaim_work)){....}-{0:0}, at:
process_one_work+0x743/0x14f0

 =============================================
 INFO: task kworker/14:1:476 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #297
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/14:1    state:D stack:0     pid:476   tgid:476   ppid:2
    flags:0x00004000
 Workqueue: ceph-cap ceph_cap_reclaim_work
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? kick_pool+0x1da/0x530
  schedule+0xdc/0x340
  io_schedule+0xc0/0x130
  folio_wait_bit_common+0x23a/0x4f0
  ? __pfx_folio_wait_bit_common+0x10/0x10
  ? __pfx_wake_page_function+0x10/0x10
  ? __pfx_truncate_folio_batch_exceptionals.part.0+0x10/0x10
  folio_wait_private_2+0x37/0x70
  netfs_invalidate_folio+0x168/0x520
  ? ceph_invalidate_folio+0x114/0x2a0
  truncate_cleanup_folio+0x281/0x340
  truncate_inode_pages_range+0x1bb/0x780
  ? __pfx_truncate_inode_pages_range+0x10/0x10
  ? __lock_acquire.constprop.0+0x598/0x13e0
  ? release_sock+0x1b/0x180
  ? reacquire_held_locks+0x1e9/0x460
  ? release_sock+0x1b/0x180
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x206/0x660
  ? truncate_inode_pages_final+0x59/0x80
  ? __pfx_lock_release+0x10/0x10
  ? do_raw_spin_lock+0x12d/0x270
  ? __pfx_do_raw_spin_lock+0x10/0x10
  ? find_held_lock+0x2d/0x110
  ? do_raw_spin_unlock+0x54/0x220
  ceph_evict_inode+0x17e/0x6b0
  ? lock_release+0x206/0x660
  ? __pfx_ceph_evict_inode+0x10/0x10
  ? __pfx_lock_release+0x10/0x10
  ? lock_is_held_type+0xdb/0x110
  evict+0x331/0x780
  ? __pfx_evict+0x10/0x10
  ? do_raw_spin_unlock+0x54/0x220
  ? _raw_spin_unlock+0x1f/0x30
  ? iput.part.0+0x3d0/0x670
  __dentry_kill+0x17b/0x4f0
  dput+0x2a6/0x4a0
  __dentry_leases_walk+0x6c6/0x10d0
  ? do_raw_spin_lock+0x12d/0x270
  ? __pfx___dentry_leases_walk+0x10/0x10
  ? __pfx_do_raw_spin_lock+0x10/0x10
  ceph_trim_dentries+0x1b1/0x260
  ? __pfx_ceph_trim_dentries+0x10/0x10
  ? lock_acquire+0x11f/0x290
  ? process_one_work+0x743/0x14f0
  ceph_cap_reclaim_work+0x19/0xc0
  process_one_work+0x7b4/0x14f0
  ? __pfx_process_one_work+0x10/0x10
  ? assign_work+0x16c/0x240
  ? lock_is_held_type+0x9a/0x110
  worker_thread+0x52b/0xe40
  ? do_raw_spin_unlock+0x54/0x220
  ? __kthread_parkme+0x95/0x120
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x28a/0x350
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2d/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-12-06 15:08     ` Max Kellermann
  2024-12-06 15:13       ` Max Kellermann
@ 2024-12-06 16:29       ` Max Kellermann
  2024-12-06 21:13         ` David Howells
  1 sibling, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-06 16:29 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Fri, Dec 6, 2024 at 4:08 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):

(Correction: this was 6.12.3, not 6.12.2)

I tried with 6.12.3 + dhowells/netfs-writeback; David's branch solved
many problems and it took much longer to trigger the hang, but it
eventually occurred:

 INFO: task bash:6599 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #298
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:bash            state:D stack:0     pid:6599  tgid:6599
ppid:6598   flags:0x00000006
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? is_dynamic_key+0x120/0x150
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? schedule+0x1e8/0x340
  schedule+0xdc/0x340
  schedule_preempt_disabled+0xa/0x10
  rwsem_down_read_slowpath+0x6ba/0xd00
  ? __pfx_rwsem_down_read_slowpath+0x10/0x10
  ? kernel_text_address+0xb8/0x150
  ? lock_acquire+0x11f/0x290
  ? ceph_start_io_read+0x19/0x80
  down_read+0xcd/0x220
  ? __pfx_down_read+0x10/0x10
  ? do_sys_openat2+0x106/0x160
  ? stack_trace_save+0x96/0xd0
  ? __pfx_stack_trace_save+0x10/0x10
  ceph_start_io_read+0x19/0x80
  ceph_read_iter+0x2e2/0xe70
  ? __pfx_ceph_read_iter+0x10/0x10
  ? psi_task_switch+0x256/0x810
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x206/0x660
  ? finish_task_switch.isra.0+0x1db/0xa40
  vfs_read+0x6e1/0xc40
  ? lock_acquire+0x11f/0x290
  ? finish_task_switch.isra.0+0x129/0xa40
  ? __pfx_vfs_read+0x10/0x10
  ? finish_task_switch.isra.0+0x225/0xa40
  ? fdget_pos+0x1b3/0x540
  ? __pfx___seccomp_filter+0x10/0x10
  ksys_read+0xee/0x1c0
  ? __pfx_ksys_read+0x10/0x10
  ? lock_release+0x206/0x660
  ? __ceph_do_getattr+0xe8/0x380
  do_syscall_64+0x64/0x100
  ? fdget_raw+0x53/0x390
  ? __do_sys_newfstatat+0x86/0xd0
  ? __pfx___do_sys_newfstatat+0x10/0x10
  ? syscall_exit_to_user_mode+0x57/0x120
  ? do_syscall_64+0x70/0x100
  ? irqentry_exit_to_user_mode+0x3d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f71fb04f21d
 RSP: 002b:00007ffdd516a918 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f71fb04f21d
 RDX: 0000000000003003 RSI: 00005597dc55d4d0 RDI: 0000000000000003
 RBP: 0000000000003003 R08: 00007f71fb12a020 R09: 00007f71fb12a020
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
 R13: 00005597dc485340 R14: 00005597dc55d4d0 R15: 00005597be8d7524
  </TASK>
 INFO: task bash:6614 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #298
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:bash            state:D stack:0     pid:6614  tgid:6614
ppid:6613   flags:0x00000002
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? __free_insn_slot+0x370/0x3d0
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? schedule+0x1e8/0x340
  schedule+0xdc/0x340
  schedule_preempt_disabled+0xa/0x10
  rwsem_down_read_slowpath+0x6ba/0xd00
  ? __pfx_rwsem_down_read_slowpath+0x10/0x10
  ? kasan_save_stack+0x1c/0x40
  ? kasan_save_track+0x10/0x30
  ? lock_acquire+0x11f/0x290
  ? ceph_start_io_read+0x19/0x80
  ? find_held_lock+0x2d/0x110
  down_read+0xcd/0x220
  ? __ceph_caps_issued_mask+0x416/0xa10
  ? __pfx_down_read+0x10/0x10
  ceph_start_io_read+0x19/0x80
  ceph_read_iter+0x2e2/0xe70
  ? _copy_to_user+0x50/0x70
  ? __pfx_ceph_read_iter+0x10/0x10
  ? fdget_raw+0x53/0x390
  vfs_read+0x6e1/0xc40
  ? __do_sys_newfstatat+0x86/0xd0
  ? __pfx___do_sys_newfstatat+0x10/0x10
  ? __pfx_vfs_read+0x10/0x10
  ? fdget_pos+0x1b3/0x540
  ? __pfx___seccomp_filter+0x10/0x10
  ksys_read+0xee/0x1c0
  ? __pfx_ksys_read+0x10/0x10
  do_syscall_64+0x64/0x100
  ? do_user_addr_fault+0x401/0x8f0
  ? find_held_lock+0x59/0x110
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x206/0x660
  ? do_user_addr_fault+0x45e/0x8f0
  ? __pfx_lock_release+0x10/0x10
  ? do_user_addr_fault+0x401/0x8f0
  ? do_user_addr_fault+0x463/0x8f0
  ? irqentry_exit_to_user_mode+0x3d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fa84a43a21d
 RSP: 002b:00007ffec8720278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa84a43a21d
 RDX: 0000000000003003 RSI: 0000565007f224d0 RDI: 0000000000000003
 RBP: 0000000000003003 R08: 00007fa84a515020 R09: 00007fa84a515020
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
 R13: 0000565007e4a340 R14: 0000565007f224d0 R15: 0000565005ba8524
  </TASK>

 Showing all locks held in the system:
 1 lock held by khungtaskd/163:
  #0: ffffffffae629b80 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x64/0x280
 2 locks held by bash/3365:
  #0: ffff8881661803e0 (sb_writers#19){....}-{0:0}, at: ksys_write+0xee/0x1c0
  #1: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_write+0x15/0x30
 1 lock held by bash/6599:
  #0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80
 1 lock held by bash/6614:
  #0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80

 =============================================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-12-06 16:29       ` Max Kellermann
@ 2024-12-06 21:13         ` David Howells
  2024-12-07 18:39           ` Max Kellermann
  0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-06 21:13 UTC (permalink / raw)
  To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel

Max Kellermann <max.kellermann@ionos.com> wrote:

> 
> I tried with 6.12.3 + dhowells/netfs-writeback; David's branch solved
> many problems and it took much longer to trigger the hang, but it
> eventually occurred:

Can you get some tracing for me?

echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_failure/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_rreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_sreq/enable
echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_folio/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_read/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_write/enable

(And compress the result before posting)

Thanks,
David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-12-06 21:13         ` David Howells
@ 2024-12-07 18:39           ` Max Kellermann
  2024-12-08 10:20             ` Max Kellermann
  0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-07 18:39 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Fri, Dec 6, 2024 at 10:13 PM David Howells <dhowells@redhat.com> wrote:
> Can you get some tracing for me?

Here's a Brotli-compressed trace (of 6.12.3 +
dhowells/netfs-writeback). I can reproduce it this way:

1. log in via SSH (cm4all-lukko is our SSH server)
2. "cp" a file on the Ceph home directory
3. log out - bash hangs inside write():

[<0>] folio_wait_bit_common+0x23a/0x4f0
[<0>] folio_wait_private_2_killable+0x3b/0x80
[<0>] netfs_write_begin+0x5a5/0xb90
[<0>] ceph_write_begin+0x6f/0x150
[<0>] generic_perform_write+0x253/0x7d0
[<0>] ceph_write_iter+0xafd/0x1010
[<0>] vfs_write+0x5bc/0x1030
[<0>] ksys_write+0xee/0x1c0
[<0>] do_syscall_64+0x64/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

bash tries to write the history file when logging out and that hangs.
This apparently always happens after somebody does copy_file_range on
Ceph - "cp" in my example.

Triggering this is trivial for me, so if you need any more data, I can get it.

Max

[-- Attachment #2: trace.br --]
[-- Type: application/octet-stream, Size: 178978 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 6.12 WARNING in netfs_consume_read_data()
  2024-12-07 18:39           ` Max Kellermann
@ 2024-12-08 10:20             ` Max Kellermann
  2024-12-09 13:14               ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
  0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-08 10:20 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Sat, Dec 7, 2024 at 7:39 PM Max Kellermann <max.kellermann@ionos.com> wrote:
> Here's a Brotli-compressed trace (of 6.12.3 +
> dhowells/netfs-writeback). I can reproduce it this way:
>
> 1. log in via SSH (cm4all-lukko is our SSH server)
> 2. "cp" a file on the Ceph home directory
> 3. log out - bash hangs inside write():

Today, I took the time to bisect it, and the bisect arrived at
first-bad commit 796a4049640b54cb1daf9e7fe543292c5ca02c74 ("netfs: In
readahead, put the folio refs as soon extracted").

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] netfs: Fix ceph copy to cache on write-begin
  2024-12-08 10:20             ` Max Kellermann
@ 2024-12-09 13:14               ` David Howells
  2024-12-09 13:31                 ` Max Kellermann
  0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-09 13:14 UTC (permalink / raw)
  To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel

Hi Max,

Could you try this?

David
---
netfs: Fix ceph copy to cache on write-begin

At the end of netfs_unlock_read_folio() in which folios are marked
appropriately for copying to the cache (either with by being marked dirty
and having their private data set or by having PG_private_2 set) and then
unlocked, the folio_queue struct has the entry pointing to the folio
cleared.  This presents a problem for netfs_pgpriv2_write_to_the_cache(),
which is used to write folios marked with PG_private_2 to the cache as it
expects to be able to trawl the folio_queue list thereafter to find the
relevant folios, leading to a hang.

Fix this by not clearing the folio_queue entry if we're going to do the
deprecated copy-to-cache.  The clearance will be done instead as the folios
are written to the cache.

This can be reproduced by starting cachefiles, mounting a ceph filesystem
with "-o fsc" and writing to it.

Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/read_collect.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 849f40f64443..72a16222b63b 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -62,10 +62,14 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq,
 		} else {
 			trace_netfs_folio(folio, netfs_folio_trace_read_done);
 		}
+
+		folioq_clear(folioq, slot);
 	} else {
 		// TODO: Use of PG_private_2 is deprecated.
 		if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags))
 			netfs_pgpriv2_mark_copy_to_cache(subreq, rreq, folioq, slot);
+		else
+			folioq_clear(folioq, slot);
 	}
 
 	if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
@@ -77,8 +81,6 @@ static void netfs_unlock_read_folio(struct netfs_io_subrequest *subreq,
 			folio_unlock(folio);
 		}
 	}
-
-	folioq_clear(folioq, slot);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
  2024-12-09 13:14               ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
@ 2024-12-09 13:31                 ` Max Kellermann
  2024-12-12 16:00                   ` David Howells
  0 siblings, 1 reply; 14+ messages in thread
From: Max Kellermann @ 2024-12-09 13:31 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Mon, Dec 9, 2024 at 2:14 PM David Howells <dhowells@redhat.com> wrote:
> Could you try this?

No change, still hangs immediately on the first try.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
  2024-12-09 13:31                 ` Max Kellermann
@ 2024-12-12 16:00                   ` David Howells
  2025-01-07 10:58                     ` Max Kellermann
  0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2024-12-12 16:00 UTC (permalink / raw)
  To: Max Kellermann; +Cc: dhowells, Jeff Layton, netfs, linux-fsdevel, linux-kernel

How about if you add the attached?

For convenience, I've put the outstanding fix patches I have here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes

David
---
commit d0bc2ecca996105f55da22e8867905ca1dad7c8f
Author: David Howells <dhowells@redhat.com>
Date:   Thu Dec 12 15:26:24 2024 +0000

    netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
    
    When the caching for a cookie is temporarily disabled (e.g. due to a DIO
    write on that file), future copying to the cache for that file is disabled
    until all fds open on that file are closed.  However, if netfslib is using
    the deprecated PG_private_2 method (such as is currently used by ceph), and
    decides it wants to copy to the cache, netfs_advance_write() will just bail
    at the first check seeing that the cache stream is unavailable, and
    indicate that it dealt with all the content.
    
    This means that we have no subrequests to provide notifications to drive
    the state machine or even to pin the request and the request just gets
    discarded, leaving the folios with PG_private_2 set.
    
    Fix this by jumping directly to cancel the request if the cache is not
    available.  That way, we don't remove mark3 from the folio_queue list and
    netfs_pgpriv2_cancel() will clean up the folios.
    
    This was found by running the generic/013 xfstest against ceph with an active
    cache and the "-o fsc" option passed to ceph.  That would usually hang
    
    Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
    Reported-by: Max Kellermann <max.kellermann@ionos.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: Ilya Dryomov <idryomov@gmail.com>
    cc: Xiubo Li <xiubli@redhat.com>
    cc: netfs@lists.linux.dev
    cc: ceph-devel@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index ba5af89d37fa..54d5004fec18 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -170,6 +170,10 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
 
 	trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
 	netfs_stat(&netfs_n_wh_copy_to_cache);
+	if (!wreq->io_streams[1].avail) {
+		netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+		goto couldnt_start;
+	}
 
 	for (;;) {
 		error = netfs_pgpriv2_copy_folio(wreq, folio);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix ceph copy to cache on write-begin
  2024-12-12 16:00                   ` David Howells
@ 2025-01-07 10:58                     ` Max Kellermann
  0 siblings, 0 replies; 14+ messages in thread
From: Max Kellermann @ 2025-01-07 10:58 UTC (permalink / raw)
  To: David Howells; +Cc: Jeff Layton, netfs, linux-fsdevel, linux-kernel

On Thu, Dec 12, 2024 at 5:00 PM David Howells <dhowells@redhat.com> wrote:
> How about if you add the attached?

Still no change, first try led to hang, as always.

(Sorry for the late reply, I was offline on vacation)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-01-07 10:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 23:04 6.12 WARNING in netfs_consume_read_data() Max Kellermann
2024-11-29 12:41 ` David Howells
2024-11-29 21:06   ` Max Kellermann
2024-11-29 22:29   ` Max Kellermann
2024-12-06 15:08     ` Max Kellermann
2024-12-06 15:13       ` Max Kellermann
2024-12-06 16:29       ` Max Kellermann
2024-12-06 21:13         ` David Howells
2024-12-07 18:39           ` Max Kellermann
2024-12-08 10:20             ` Max Kellermann
2024-12-09 13:14               ` [PATCH] netfs: Fix ceph copy to cache on write-begin David Howells
2024-12-09 13:31                 ` Max Kellermann
2024-12-12 16:00                   ` David Howells
2025-01-07 10:58                     ` Max Kellermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox