* [PATCH 0/2] smb: client: fix problems with smbdirect/rdma mounts
@ 2025-06-18 16:51 Stefan Metzmacher
2025-06-18 16:51 ` [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() Stefan Metzmacher
2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher
0 siblings, 2 replies; 15+ messages in thread
From: Stefan Metzmacher @ 2025-06-18 16:51 UTC (permalink / raw)
To: linux-cifs; +Cc: metze, Steve French, David Howells, Tom Talpey
Hi,
here are two regression fixes for rdma mounts using smbdirect
Actually I haven't verified it all worked before
commit 3d78fe73fa123964be30f0acec449dc8a2241dae
"cifs: Build the RDMA SGE list directly from an iterator",
but from reading the code it might be possible that it worked
before.
Stefan Metzmacher (2):
smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()
smb: client: let smbd_post_send_iter() respect the peers max_send_size
and transmit all data
fs/smb/client/smbdirect.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() 2025-06-18 16:51 [PATCH 0/2] smb: client: fix problems with smbdirect/rdma mounts Stefan Metzmacher @ 2025-06-18 16:51 ` Stefan Metzmacher 2025-06-19 11:41 ` David Howells 2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher 1 sibling, 1 reply; 15+ messages in thread From: Stefan Metzmacher @ 2025-06-18 16:51 UTC (permalink / raw) To: linux-cifs; +Cc: metze, Steve French, David Howells, Tom Talpey This fixes the following problem: [ 749.901015] [ T8673] run fstests cifs/001 at 2025-06-17 09:40:30 [ 750.346409] [ T9870] ================================================================== [ 750.346814] [ T9870] BUG: KASAN: slab-out-of-bounds in smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.347330] [ T9870] Write of size 8 at addr ffff888011082890 by task xfs_io/9870 [ 750.347705] [ T9870] [ 750.348077] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Not tainted 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary) [ 750.348082] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 750.348085] [ T9870] Call Trace: [ 750.348086] [ T9870] <TASK> [ 750.348088] [ T9870] dump_stack_lvl+0x76/0xa0 [ 750.348106] [ T9870] print_report+0xd1/0x640 [ 750.348116] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.348120] [ T9870] ? kasan_complete_mode_report_info+0x26/0x210 [ 750.348124] [ T9870] kasan_report+0xe7/0x130 [ 750.348128] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348262] [ T9870] ? smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348377] [ T9870] __asan_report_store8_noabort+0x17/0x30 [ 750.348381] [ T9870] smb_set_sge+0x2cc/0x3b0 [cifs] [ 750.348496] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs] [ 750.348625] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs] [ 750.348741] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.348749] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.348870] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.348990] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.348995] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.349117] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs] [ 750.349231] [ T9870] ? unwind_get_return_address+0x65/0xb0 [ 750.349235] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 750.349242] [ T9870] ? arch_stack_walk+0xa7/0x100 [ 750.349250] [ T9870] ? stack_trace_save+0x92/0xd0 [ 750.349254] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.349374] [ T9870] ? kernel_text_address+0x173/0x190 [ 750.349379] [ T9870] ? kasan_save_stack+0x39/0x70 [ 750.349382] [ T9870] ? kasan_save_track+0x18/0x70 [ 750.349385] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0 [ 750.349389] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs] [ 750.349508] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs] [ 750.349626] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs] [ 750.349746] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.349867] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.349900] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs] [ 750.349929] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.349958] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs] [ 750.349987] [ T9870] ? do_writepages+0x21f/0x590 [ 750.349993] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140 [ 750.349997] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.350002] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.350131] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] [ 750.350255] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.350261] [ T9870] ? kasan_save_alloc_info+0x37/0x60 [ 750.350268] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.350271] [ T9870] ? _raw_spin_lock+0x81/0xf0 [ 750.350275] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.350278] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs] [ 750.350398] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.350518] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs] [ 750.350636] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs] [ 750.350756] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.350760] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.350763] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs] [ 750.350891] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.351008] [ T9870] ? sched_clock_noinstr+0x9/0x10 [ 750.351012] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.351018] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs] [ 750.351144] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.351150] [ T9870] ? _raw_spin_unlock+0xe/0x40 [ 750.351154] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs] [ 750.351275] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.351554] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.351677] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.351710] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.351740] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs] [ 750.351769] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.351798] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.351804] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.351835] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs] [ 750.351864] [ T9870] ? exit_files+0xab/0xe0 [ 750.351867] [ T9870] ? do_exit+0x148f/0x2980 [ 750.351871] [ T9870] ? do_group_exit+0xb5/0x250 [ 750.351874] [ T9870] ? arch_do_signal_or_restart+0x92/0x630 [ 750.351879] [ T9870] ? exit_to_user_mode_loop+0x98/0x170 [ 750.351882] [ T9870] ? do_syscall_64+0x2cf/0xd80 [ 750.351886] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.351890] [ T9870] do_writepages+0x21f/0x590 [ 750.351894] [ T9870] ? __pfx_do_writepages+0x10/0x10 [ 750.351897] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.351901] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.351904] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [ 750.351912] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.351916] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.351920] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.352042] [ T9870] filp_flush+0x107/0x1a0 [ 750.352046] [ T9870] filp_close+0x14/0x30 [ 750.352049] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.352053] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.352058] [ T9870] exit_files+0xab/0xe0 [ 750.352061] [ T9870] do_exit+0x148f/0x2980 [ 750.352065] [ T9870] ? __pfx_do_exit+0x10/0x10 [ 750.352069] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.352072] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.352076] [ T9870] do_group_exit+0xb5/0x250 [ 750.352080] [ T9870] get_signal+0x22d3/0x22e0 [ 750.352086] [ T9870] ? __pfx_get_signal+0x10/0x10 [ 750.352089] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.352101] [ T9870] ? folio_add_lru+0xda/0x120 [ 750.352105] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.352109] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ 750.352115] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.352118] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.352123] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.352126] [ T9870] ? count_memcg_events+0x1b4/0x420 [ 750.352132] [ T9870] ? handle_mm_fault+0x148/0x690 [ 750.352136] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.352140] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.352143] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.352146] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250 [ 750.352151] [ T9870] ? irqentry_exit+0x43/0x50 [ 750.352154] [ T9870] ? exc_page_fault+0x75/0xe0 [ 750.352160] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.352163] [ T9870] RIP: 0033:0x7858c94ab6e2 [ 750.352167] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8. [ 750.352175] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022 [ 750.352179] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2 [ 750.352182] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 750.352184] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000 [ 750.352185] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0 [ 750.352187] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230 [ 750.352191] [ T9870] </TASK> [ 750.352195] [ T9870] [ 750.395206] [ T9870] Allocated by task 9870 on cpu 0 at 750.346406s: [ 750.395523] [ T9870] kasan_save_stack+0x39/0x70 [ 750.395532] [ T9870] kasan_save_track+0x18/0x70 [ 750.395536] [ T9870] kasan_save_alloc_info+0x37/0x60 [ 750.395539] [ T9870] __kasan_slab_alloc+0x9d/0xa0 [ 750.395543] [ T9870] kmem_cache_alloc_noprof+0x13c/0x3f0 [ 750.395548] [ T9870] mempool_alloc_slab+0x15/0x20 [ 750.395553] [ T9870] mempool_alloc_noprof+0x135/0x340 [ 750.395557] [ T9870] smbd_post_send_iter+0x63e/0x3070 [cifs] [ 750.395694] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.395819] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.395950] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.396081] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.396232] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.396359] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.396492] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.396544] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.396576] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.396608] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.396639] [ T9870] do_writepages+0x21f/0x590 [ 750.396643] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.396647] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.396651] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.396656] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.396787] [ T9870] filp_flush+0x107/0x1a0 [ 750.396791] [ T9870] filp_close+0x14/0x30 [ 750.396795] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.396800] [ T9870] exit_files+0xab/0xe0 [ 750.396803] [ T9870] do_exit+0x148f/0x2980 [ 750.396808] [ T9870] do_group_exit+0xb5/0x250 [ 750.396813] [ T9870] get_signal+0x22d3/0x22e0 [ 750.396817] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.396822] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.396827] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.396832] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.396836] [ T9870] [ 750.397150] [ T9870] The buggy address belongs to the object at ffff888011082800 which belongs to the cache smbd_request_0000000008f3bd7b of size 144 [ 750.397798] [ T9870] The buggy address is located 0 bytes to the right of allocated 144-byte region [ffff888011082800, ffff888011082890) [ 750.398469] [ T9870] [ 750.398800] [ T9870] The buggy address belongs to the physical page: [ 750.399141] [ T9870] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11082 [ 750.399148] [ T9870] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 750.399155] [ T9870] page_type: f5(slab) [ 750.399161] [ T9870] raw: 000fffffc0000000 ffff888022d65640 dead000000000122 0000000000000000 [ 750.399165] [ T9870] raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000 [ 750.399169] [ T9870] page dumped because: kasan: bad access detected [ 750.399172] [ T9870] [ 750.399505] [ T9870] Memory state around the buggy address: [ 750.399863] [ T9870] ffff888011082780: fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.400247] [ T9870] ffff888011082800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 750.400618] [ T9870] >ffff888011082880: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.400982] [ T9870] ^ [ 750.401370] [ T9870] ffff888011082900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.401774] [ T9870] ffff888011082980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 750.402171] [ T9870] ================================================================== [ 750.402696] [ T9870] Disabling lock debugging due to kernel taint [ 750.403202] [ T9870] BUG: unable to handle page fault for address: ffff8880110a2000 [ 750.403797] [ T9870] #PF: supervisor write access in kernel mode [ 750.404204] [ T9870] #PF: error_code(0x0003) - permissions violation [ 750.404581] [ T9870] PGD 5ce01067 P4D 5ce01067 PUD 5ce02067 PMD 78aa063 PTE 80000000110a2021 [ 750.404969] [ T9870] Oops: Oops: 0003 [#1] SMP KASAN PTI [ 750.405394] [ T9870] CPU: 0 UID: 0 PID: 9870 Comm: xfs_io Kdump: loaded Tainted: G B 6.16.0-rc2-metze.02+ #1 PREEMPT(voluntary) [ 750.406510] [ T9870] Tainted: [B]=BAD_PAGE [ 750.406967] [ T9870] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 750.407440] [ T9870] RIP: 0010:smb_set_sge+0x15c/0x3b0 [cifs] [ 750.408065] [ T9870] Code: 48 83 f8 ff 0f 84 b0 00 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 80 3c 11 00 0f 85 69 01 00 00 49 8d 7c 24 08 <49> 89 04 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f [ 750.409283] [ T9870] RSP: 0018:ffffc90005e2e758 EFLAGS: 00010246 [ 750.409803] [ T9870] RAX: ffff888036c53400 RBX: ffffc90005e2e878 RCX: 1ffff11002214400 [ 750.410323] [ T9870] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8880110a2008 [ 750.411217] [ T9870] RBP: ffffc90005e2e798 R08: 0000000000000001 R09: 0000000000000400 [ 750.411770] [ T9870] R10: ffff888011082800 R11: 0000000000000000 R12: ffff8880110a2000 [ 750.412325] [ T9870] R13: 0000000000000000 R14: ffffc90005e2e888 R15: ffff88801a4b6000 [ 750.412901] [ T9870] FS: 0000000000000000(0000) GS:ffff88812bc68000(0000) knlGS:0000000000000000 [ 750.413477] [ T9870] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 750.414077] [ T9870] CR2: ffff8880110a2000 CR3: 000000005b0a6005 CR4: 00000000000726f0 [ 750.414654] [ T9870] Call Trace: [ 750.415211] [ T9870] <TASK> [ 750.415748] [ T9870] smbd_post_send_iter+0x1990/0x3070 [cifs] [ 750.416449] [ T9870] ? __pfx_smbd_post_send_iter+0x10/0x10 [cifs] [ 750.417128] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.417685] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.418380] [ T9870] ? cifs_flush+0x153/0x320 [cifs] [ 750.419055] [ T9870] ? update_stack_state+0x2a0/0x670 [ 750.419624] [ T9870] smbd_send+0x58c/0x9c0 [cifs] [ 750.420297] [ T9870] ? __pfx_smbd_send+0x10/0x10 [cifs] [ 750.420936] [ T9870] ? unwind_get_return_address+0x65/0xb0 [ 750.421456] [ T9870] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 750.421954] [ T9870] ? arch_stack_walk+0xa7/0x100 [ 750.422460] [ T9870] ? stack_trace_save+0x92/0xd0 [ 750.422948] [ T9870] __smb_send_rqst+0x931/0xec0 [cifs] [ 750.423579] [ T9870] ? kernel_text_address+0x173/0x190 [ 750.424056] [ T9870] ? kasan_save_stack+0x39/0x70 [ 750.424813] [ T9870] ? kasan_save_track+0x18/0x70 [ 750.425323] [ T9870] ? __kasan_slab_alloc+0x9d/0xa0 [ 750.425831] [ T9870] ? __pfx___smb_send_rqst+0x10/0x10 [cifs] [ 750.426548] [ T9870] ? smb2_mid_entry_alloc+0xb4/0x7e0 [cifs] [ 750.427231] [ T9870] ? cifs_call_async+0x277/0xb00 [cifs] [ 750.427882] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.428909] [ T9870] ? netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.429425] [ T9870] ? netfs_advance_write+0x45b/0x1270 [netfs] [ 750.429882] [ T9870] ? netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.430345] [ T9870] ? netfs_writepages+0x2e9/0xa80 [netfs] [ 750.430809] [ T9870] ? do_writepages+0x21f/0x590 [ 750.431239] [ T9870] ? filemap_fdatawrite_wbc+0xe1/0x140 [ 750.431652] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.432041] [ T9870] smb_send_rqst+0x22e/0x2f0 [cifs] [ 750.432586] [ T9870] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] [ 750.433108] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.433482] [ T9870] ? kasan_save_alloc_info+0x37/0x60 [ 750.433855] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.434214] [ T9870] ? _raw_spin_lock+0x81/0xf0 [ 750.434561] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.434903] [ T9870] ? smb2_setup_async_request+0x293/0x580 [cifs] [ 750.435394] [ T9870] cifs_call_async+0x477/0xb00 [cifs] [ 750.435892] [ T9870] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs] [ 750.436388] [ T9870] ? __pfx_cifs_call_async+0x10/0x10 [cifs] [ 750.436881] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.437237] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.437579] [ T9870] ? __smb2_plain_req_init+0x933/0x1090 [cifs] [ 750.438062] [ T9870] smb2_async_writev+0x15ff/0x2460 [cifs] [ 750.438557] [ T9870] ? sched_clock_noinstr+0x9/0x10 [ 750.438906] [ T9870] ? local_clock_noinstr+0xe/0xd0 [ 750.439293] [ T9870] ? __pfx_smb2_async_writev+0x10/0x10 [cifs] [ 750.439786] [ T9870] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 750.440143] [ T9870] ? _raw_spin_unlock+0xe/0x40 [ 750.440495] [ T9870] ? cifs_pick_channel+0x242/0x370 [cifs] [ 750.440989] [ T9870] cifs_issue_write+0x256/0x610 [cifs] [ 750.441492] [ T9870] ? cifs_issue_write+0x256/0x610 [cifs] [ 750.441987] [ T9870] netfs_do_issue_write+0xc2/0x340 [netfs] [ 750.442387] [ T9870] netfs_advance_write+0x45b/0x1270 [netfs] [ 750.442969] [ T9870] ? rolling_buffer_append+0x12d/0x440 [netfs] [ 750.443376] [ T9870] netfs_write_folio+0xd6c/0x1be0 [netfs] [ 750.443768] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.444145] [ T9870] netfs_writepages+0x2e9/0xa80 [netfs] [ 750.444541] [ T9870] ? __pfx_netfs_writepages+0x10/0x10 [netfs] [ 750.444936] [ T9870] ? exit_files+0xab/0xe0 [ 750.445312] [ T9870] ? do_exit+0x148f/0x2980 [ 750.445672] [ T9870] ? do_group_exit+0xb5/0x250 [ 750.446028] [ T9870] ? arch_do_signal_or_restart+0x92/0x630 [ 750.446402] [ T9870] ? exit_to_user_mode_loop+0x98/0x170 [ 750.446762] [ T9870] ? do_syscall_64+0x2cf/0xd80 [ 750.447132] [ T9870] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.447499] [ T9870] do_writepages+0x21f/0x590 [ 750.447859] [ T9870] ? __pfx_do_writepages+0x10/0x10 [ 750.448236] [ T9870] filemap_fdatawrite_wbc+0xe1/0x140 [ 750.448595] [ T9870] __filemap_fdatawrite_range+0xba/0x100 [ 750.448953] [ T9870] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [ 750.449336] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.449697] [ T9870] filemap_write_and_wait_range+0x7d/0xf0 [ 750.450062] [ T9870] cifs_flush+0x153/0x320 [cifs] [ 750.450592] [ T9870] filp_flush+0x107/0x1a0 [ 750.450952] [ T9870] filp_close+0x14/0x30 [ 750.451322] [ T9870] put_files_struct.part.0+0x126/0x2a0 [ 750.451678] [ T9870] ? __pfx__raw_spin_lock+0x10/0x10 [ 750.452033] [ T9870] exit_files+0xab/0xe0 [ 750.452401] [ T9870] do_exit+0x148f/0x2980 [ 750.452751] [ T9870] ? __pfx_do_exit+0x10/0x10 [ 750.453109] [ T9870] ? __kasan_check_write+0x14/0x30 [ 750.453459] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.453787] [ T9870] do_group_exit+0xb5/0x250 [ 750.454082] [ T9870] get_signal+0x22d3/0x22e0 [ 750.454406] [ T9870] ? __pfx_get_signal+0x10/0x10 [ 750.454709] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.455031] [ T9870] ? folio_add_lru+0xda/0x120 [ 750.455347] [ T9870] arch_do_signal_or_restart+0x92/0x630 [ 750.455656] [ T9870] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ 750.455967] [ T9870] exit_to_user_mode_loop+0x98/0x170 [ 750.456282] [ T9870] do_syscall_64+0x2cf/0xd80 [ 750.456591] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.456897] [ T9870] ? count_memcg_events+0x1b4/0x420 [ 750.457280] [ T9870] ? handle_mm_fault+0x148/0x690 [ 750.457616] [ T9870] ? _raw_spin_lock_irq+0x8a/0xf0 [ 750.457925] [ T9870] ? __kasan_check_read+0x11/0x20 [ 750.458297] [ T9870] ? fpregs_assert_state_consistent+0x68/0x100 [ 750.458672] [ T9870] ? irqentry_exit_to_user_mode+0x2e/0x250 [ 750.459191] [ T9870] ? irqentry_exit+0x43/0x50 [ 750.459600] [ T9870] ? exc_page_fault+0x75/0xe0 [ 750.460130] [ T9870] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 750.460570] [ T9870] RIP: 0033:0x7858c94ab6e2 [ 750.461206] [ T9870] Code: Unable to access opcode bytes at 0x7858c94ab6b8. [ 750.461780] [ T9870] RSP: 002b:00007858c9248ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022 [ 750.462327] [ T9870] RAX: fffffffffffffdfe RBX: 00007858c92496c0 RCX: 00007858c94ab6e2 [ 750.462653] [ T9870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 750.462969] [ T9870] RBP: 00007858c9248d10 R08: 0000000000000000 R09: 0000000000000000 [ 750.463290] [ T9870] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffde0 [ 750.463640] [ T9870] R13: 0000000000000020 R14: 0000000000000002 R15: 00007ffc072d2230 [ 750.463965] [ T9870] </TASK> [ 750.464285] [ T9870] Modules linked in: siw ib_uverbs ccm cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs softdog vboxsf vboxguest cpuid intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus joydev input_leds mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo usbhid drm_vram_helper psmouse vga16fb vgastate drm_ttm_helper serio_raw hid ahci libahci ttm pata_acpi video wmi [last unloaded: vboxguest] [ 750.467127] [ T9870] CR2: ffff8880110a2000 cc: Steve French <sfrench@samba.org> cc: David Howells <dhowells@redhat.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org Fixes: c45ebd636c32 ("cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs") Signed-off-by: Stefan Metzmacher <metze@samba.org> --- fs/smb/client/smbdirect.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 5ae847919da5..cbc85bca006f 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -2589,13 +2589,14 @@ static ssize_t smb_extract_folioq_to_rdma(struct iov_iter *iter, size_t fsize = folioq_folio_size(folioq, slot); if (offset < fsize) { - size_t part = umin(maxsize - ret, fsize - offset); + size_t part = umin(maxsize, fsize - offset); if (!smb_set_sge(rdma, folio_page(folio, 0), offset, part)) return -EIO; offset += part; ret += part; + maxsize -= part; } if (offset >= fsize) { @@ -2610,7 +2611,7 @@ static ssize_t smb_extract_folioq_to_rdma(struct iov_iter *iter, slot = 0; } } - } while (rdma->nr_sge < rdma->max_sge || maxsize > 0); + } while (rdma->nr_sge < rdma->max_sge && maxsize > 0); iter->folioq = folioq; iter->folioq_slot = slot; -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() 2025-06-18 16:51 ` [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() Stefan Metzmacher @ 2025-06-19 11:41 ` David Howells 2025-06-19 19:07 ` Tom Talpey 0 siblings, 1 reply; 15+ messages in thread From: David Howells @ 2025-06-19 11:41 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: dhowells, linux-cifs, Steve French, Tom Talpey Stefan Metzmacher <metze@samba.org> wrote: > if (offset < fsize) { > - size_t part = umin(maxsize - ret, fsize - offset); > + size_t part = umin(maxsize, fsize - offset); > > if (!smb_set_sge(rdma, folio_page(folio, 0), offset, part)) > return -EIO; > > offset += part; > ret += part; > + maxsize -= part; I don't think these two changes should make a difference. > } > > if (offset >= fsize) { > @@ -2610,7 +2611,7 @@ static ssize_t smb_extract_folioq_to_rdma(struct iov_iter *iter, > slot = 0; > } > } > - } while (rdma->nr_sge < rdma->max_sge || maxsize > 0); > + } while (rdma->nr_sge < rdma->max_sge && maxsize > 0); > > iter->folioq = folioq; > iter->folioq_slot = slot; But this one definitely should! Reviewed-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() 2025-06-19 11:41 ` David Howells @ 2025-06-19 19:07 ` Tom Talpey 0 siblings, 0 replies; 15+ messages in thread From: Tom Talpey @ 2025-06-19 19:07 UTC (permalink / raw) To: David Howells, Stefan Metzmacher; +Cc: linux-cifs, Steve French On 6/19/2025 7:41 AM, David Howells wrote: > Stefan Metzmacher <metze@samba.org> wrote: > >> if (offset < fsize) { >> - size_t part = umin(maxsize - ret, fsize - offset); >> + size_t part = umin(maxsize, fsize - offset); >> >> if (!smb_set_sge(rdma, folio_page(folio, 0), offset, part)) >> return -EIO; >> >> offset += part; >> ret += part; >> + maxsize -= part; > > I don't think these two changes should make a difference. > >> } >> >> if (offset >= fsize) { >> @@ -2610,7 +2611,7 @@ static ssize_t smb_extract_folioq_to_rdma(struct iov_iter *iter, >> slot = 0; >> } >> } >> - } while (rdma->nr_sge < rdma->max_sge || maxsize > 0); >> + } while (rdma->nr_sge < rdma->max_sge && maxsize > 0); >> >> iter->folioq = folioq; >> iter->folioq_slot = slot; > > But this one definitely should! > > Reviewed-by: David Howells <dhowells@redhat.com> Agreed! This code looks to have been significantly refactored from before, it used to be a for loop with the max_sge and maxsize tests inverted as "break" conditions. The || went lost in translation when it became a do/while termination. Looks like it happened between 6.11 and 6.12. Reviewed-by: Tom Talpey <tom@talpey.com> I assume the Fixes tag is appropriate for the 6.12+ stable kernels to pick up? Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-18 16:51 [PATCH 0/2] smb: client: fix problems with smbdirect/rdma mounts Stefan Metzmacher 2025-06-18 16:51 ` [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() Stefan Metzmacher @ 2025-06-18 16:51 ` Stefan Metzmacher 2025-06-19 11:49 ` David Howells ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Stefan Metzmacher @ 2025-06-18 16:51 UTC (permalink / raw) To: linux-cifs; +Cc: metze, Steve French, David Howells, Tom Talpey We should not send smbdirect_data_transfer messages larger than the negotiated max_send_size, typically 1364 bytes, which means 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. This happened when doing an SMB2 write with more than 1340 bytes (which is done inline as it's below rdma_readwrite_threshold). It means the peer resets the connection. Note for stable sp->max_send_size needs to be info->max_send_size: @@ -895,7 +895,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, .direction = DMA_TO_DEVICE, }; size_t payload_len = min_t(size_t, *_remaining_data_length, - sp->max_send_size - sizeof(*packet)); + info->max_send_size - sizeof(*packet)); rc = smb_extract_iter_to_rdma(iter, payload_len, &extract); cc: Steve French <sfrench@samba.org> cc: David Howells <dhowells@redhat.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") Signed-off-by: Stefan Metzmacher <metze@samba.org> --- fs/smb/client/smbdirect.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index cbc85bca006f..3a41dcbbff81 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -842,7 +842,7 @@ static int smbd_post_send(struct smbd_connection *info, static int smbd_post_send_iter(struct smbd_connection *info, struct iov_iter *iter, - int *_remaining_data_length) + unsigned int *_remaining_data_length) { struct smbdirect_socket *sc = &info->socket; struct smbdirect_socket_parameters *sp = &sc->parameters; @@ -907,8 +907,10 @@ static int smbd_post_send_iter(struct smbd_connection *info, .local_dma_lkey = sc->ib.pd->local_dma_lkey, .direction = DMA_TO_DEVICE, }; + size_t payload_len = min_t(size_t, *_remaining_data_length, + sp->max_send_size - sizeof(*packet)); - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, + rc = smb_extract_iter_to_rdma(iter, payload_len, &extract); if (rc < 0) goto err_dma; @@ -970,8 +972,16 @@ static int smbd_post_send_iter(struct smbd_connection *info, request->sge[0].lkey = sc->ib.pd->local_dma_lkey; rc = smbd_post_send(info, request); - if (!rc) + if (!rc) { + if (iter && iov_iter_count(iter) > 0) { + /* + * There is more data to send + */ + goto wait_credit; + } + return 0; + } err_dma: for (i = 0; i < request->num_sge; i++) @@ -1007,7 +1017,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, */ static int smbd_post_send_empty(struct smbd_connection *info) { - int remaining_data_length = 0; + unsigned int remaining_data_length = 0; info->count_send_empty++; return smbd_post_send_iter(info, NULL, &remaining_data_length); -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher @ 2025-06-19 11:49 ` David Howells 2025-06-19 19:22 ` Tom Talpey 2025-06-25 8:00 ` David Howells 2 siblings, 0 replies; 15+ messages in thread From: David Howells @ 2025-06-19 11:49 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: dhowells, linux-cifs, Steve French, Tom Talpey Stefan Metzmacher <metze@samba.org> wrote: > diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c > index cbc85bca006f..3a41dcbbff81 100644 > --- a/fs/smb/client/smbdirect.c > +++ b/fs/smb/client/smbdirect.c > @@ -842,7 +842,7 @@ static int smbd_post_send(struct smbd_connection *info, > > static int smbd_post_send_iter(struct smbd_connection *info, > struct iov_iter *iter, > - int *_remaining_data_length) > + unsigned int *_remaining_data_length) I would recommend size_t... > { > struct smbdirect_socket *sc = &info->socket; > struct smbdirect_socket_parameters *sp = &sc->parameters; > @@ -907,8 +907,10 @@ static int smbd_post_send_iter(struct smbd_connection *info, > .local_dma_lkey = sc->ib.pd->local_dma_lkey, > .direction = DMA_TO_DEVICE, > }; > + size_t payload_len = min_t(size_t, *_remaining_data_length, > + sp->max_send_size - sizeof(*packet)); ... and umin(). > > - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, > + rc = smb_extract_iter_to_rdma(iter, payload_len, > &extract); > if (rc < 0) > goto err_dma; > @@ -970,8 +972,16 @@ static int smbd_post_send_iter(struct smbd_connection *info, > request->sge[0].lkey = sc->ib.pd->local_dma_lkey; > > rc = smbd_post_send(info, request); > - if (!rc) > + if (!rc) { > + if (iter && iov_iter_count(iter) > 0) { > + /* > + * There is more data to send > + */ > + goto wait_credit; > + } > + This fix isn't mentioned in the patch description. > return 0; > + } > > err_dma: > for (i = 0; i < request->num_sge; i++) > @@ -1007,7 +1017,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, > */ > static int smbd_post_send_empty(struct smbd_connection *info) > { > - int remaining_data_length = 0; > + unsigned int remaining_data_length = 0; size_t? > > info->count_send_empty++; > return smbd_post_send_iter(info, NULL, &remaining_data_length); Apart from the above: Reviewed-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher 2025-06-19 11:49 ` David Howells @ 2025-06-19 19:22 ` Tom Talpey 2025-06-20 12:29 ` David Howells 2025-06-23 15:46 ` Stefan Metzmacher 2025-06-25 8:00 ` David Howells 2 siblings, 2 replies; 15+ messages in thread From: Tom Talpey @ 2025-06-19 19:22 UTC (permalink / raw) To: Stefan Metzmacher, linux-cifs; +Cc: Steve French, David Howells On 6/18/2025 12:51 PM, Stefan Metzmacher wrote: > We should not send smbdirect_data_transfer messages larger than > the negotiated max_send_size, typically 1364 bytes, which means > 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. > > This happened when doing an SMB2 write with more than 1340 bytes > (which is done inline as it's below rdma_readwrite_threshold). > > It means the peer resets the connection. Obviously needs fixing but I'm unclear on the proposed change. See below. > Note for stable sp->max_send_size needs to be info->max_send_size: So this is important and maybe needs more than this comment, which is not really something that should go upstream since future stable kernels won't apply. Recommend deleting this and sending a separate patch. > @@ -895,7 +895,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, > .direction = DMA_TO_DEVICE, > }; > size_t payload_len = min_t(size_t, *_remaining_data_length, > - sp->max_send_size - sizeof(*packet)); > + info->max_send_size - sizeof(*packet)); > > rc = smb_extract_iter_to_rdma(iter, payload_len, > &extract); > > cc: Steve French <sfrench@samba.org> > cc: David Howells <dhowells@redhat.com> > cc: Tom Talpey <tom@talpey.com> > cc: linux-cifs@vger.kernel.org > Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") > Signed-off-by: Stefan Metzmacher <metze@samba.org> > --- > fs/smb/client/smbdirect.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c > index cbc85bca006f..3a41dcbbff81 100644 > --- a/fs/smb/client/smbdirect.c > +++ b/fs/smb/client/smbdirect.c > @@ -842,7 +842,7 @@ static int smbd_post_send(struct smbd_connection *info, > > static int smbd_post_send_iter(struct smbd_connection *info, > struct iov_iter *iter, > - int *_remaining_data_length) > + unsigned int *_remaining_data_length) > { > struct smbdirect_socket *sc = &info->socket; > struct smbdirect_socket_parameters *sp = &sc->parameters; > @@ -907,8 +907,10 @@ static int smbd_post_send_iter(struct smbd_connection *info, > .local_dma_lkey = sc->ib.pd->local_dma_lkey, > .direction = DMA_TO_DEVICE, > }; > + size_t payload_len = min_t(size_t, *_remaining_data_length, > + sp->max_send_size - sizeof(*packet)); > > - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, > + rc = smb_extract_iter_to_rdma(iter, payload_len, > &extract); > if (rc < 0) > goto err_dma; > @@ -970,8 +972,16 @@ static int smbd_post_send_iter(struct smbd_connection *info, > request->sge[0].lkey = sc->ib.pd->local_dma_lkey; > > rc = smbd_post_send(info, request); > - if (!rc) > + if (!rc) { > + if (iter && iov_iter_count(iter) > 0) { > + /* > + * There is more data to send > + */ > + goto wait_credit; But, shouldn't the caller have done this overflow check, and looped on the fragments and credits? It seems wrong to push the credit check down to this level. > + } > + > return 0; > + } > > err_dma: > for (i = 0; i < request->num_sge; i++) > @@ -1007,7 +1017,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, > */ > static int smbd_post_send_empty(struct smbd_connection *info) > { > - int remaining_data_length = 0; > + unsigned int remaining_data_length = 0; Does this fix something?? > > info->count_send_empty++; > return smbd_post_send_iter(info, NULL, &remaining_data_length); Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-19 19:22 ` Tom Talpey @ 2025-06-20 12:29 ` David Howells 2025-06-20 13:33 ` Tom Talpey 2025-06-25 7:59 ` David Howells 2025-06-23 15:46 ` Stefan Metzmacher 1 sibling, 2 replies; 15+ messages in thread From: David Howells @ 2025-06-20 12:29 UTC (permalink / raw) To: Tom Talpey; +Cc: dhowells, Stefan Metzmacher, linux-cifs, Steve French Tom Talpey <tom@talpey.com> wrote: > > + if (iter && iov_iter_count(iter) > 0) { > > + /* > > + * There is more data to send > > + */ > > + goto wait_credit; > > But, shouldn't the caller have done this overflow check, and looped on > the fragments and credits? It seems wrong to push the credit check down > to this level. Fair point. There's retry handling in the netfs layer - though that only applies to reads and writes that go through that. Can RDMA be used to transfer data for other large calls? Dir enumeration or ioctl, for instance. David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-20 12:29 ` David Howells @ 2025-06-20 13:33 ` Tom Talpey 2025-06-20 14:56 ` David Howells 2025-06-25 7:59 ` David Howells 1 sibling, 1 reply; 15+ messages in thread From: Tom Talpey @ 2025-06-20 13:33 UTC (permalink / raw) To: David Howells; +Cc: Stefan Metzmacher, linux-cifs, Steve French On 6/20/2025 8:29 AM, David Howells wrote: > Tom Talpey <tom@talpey.com> wrote: > >>> + if (iter && iov_iter_count(iter) > 0) { >>> + /* >>> + * There is more data to send >>> + */ >>> + goto wait_credit; >> >> But, shouldn't the caller have done this overflow check, and looped on >> the fragments and credits? It seems wrong to push the credit check down >> to this level. > > Fair point. There's retry handling in the netfs layer - though that only > applies to reads and writes that go through that. Can RDMA be used to > transfer data for other large calls? Dir enumeration or ioctl, for instance. No, the SMB3 protocol only uses RDMA direct placement for the payload of a read or write. All message headers, plus fsctl and other operation payloads are transported via these smbdirect datagrams. Most SMB3 messages are of a fairly small and well-defined size so this greatly simplifies the transport layering. Fsctls are limited to 64KB and since they generally transmit structured data (as opposed to bulk), they don't benefit much from RDMA either. My comment on the layering was because the code did not used to do this, therefore it seems this is a fundamental change, which I'd greatly prefer to avoid. Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-20 13:33 ` Tom Talpey @ 2025-06-20 14:56 ` David Howells 0 siblings, 0 replies; 15+ messages in thread From: David Howells @ 2025-06-20 14:56 UTC (permalink / raw) To: Tom Talpey; +Cc: dhowells, Stefan Metzmacher, linux-cifs, Steve French Tom Talpey <tom@talpey.com> wrote: > My comment on the layering was because the code did not used to do > this, therefore it seems this is a fundamental change, which I'd greatly > prefer to avoid. Yeah. Ideally, we'd have each netfs subrequest corresponding to one RDMA operation. I try to size the subreqs appropriately, both in terms of the maximum size in bytes and the maximum number of segments in cifs_prepare_read() and cifs_prepare_write(). David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-20 12:29 ` David Howells 2025-06-20 13:33 ` Tom Talpey @ 2025-06-25 7:59 ` David Howells 1 sibling, 0 replies; 15+ messages in thread From: David Howells @ 2025-06-25 7:59 UTC (permalink / raw) To: Tom Talpey, Stefan Metzmacher; +Cc: dhowells, linux-cifs, Steve French David Howells <dhowells@redhat.com> wrote: > > > + if (iter && iov_iter_count(iter) > 0) { > > > + /* > > > + * There is more data to send > > > + */ > > > + goto wait_credit; > > > > But, shouldn't the caller have done this overflow check, and looped on > > the fragments and credits? It seems wrong to push the credit check down > > to this level. > > Fair point. There's retry handling in the netfs layer - though that only > applies to reads and writes that go through that. Can RDMA be used to > transfer data for other large calls? Dir enumeration or ioctl, for instance. Actually, I'm wrong. We do need this because we can come down this path from non-netfs generated RPC ops. I stuck a WARN_ON_ONCE() on the path to see what generated it, and got: WARNING: CPU: 0 PID: 6773 at fs/smb/client/smbdirect.c:980 smbd_post_send_iter+0x768/0x840 ... RIP: 0010:smbd_post_send_iter+0x768/0x840 ... Call Trace: <TASK> smbd_send+0x1bb/0x280 ? __smb_send_rqst+0x7c/0x3c0 __smb_send_rqst+0x7c/0x3c0 ? rb_erase+0x30/0x280 smb_send_rqst+0x6a/0x150 ? remove_hrtimer+0x5e/0x70 compound_send_recv+0x31b/0x650 ? __kmalloc_noprof+0x262/0x290 ? kmem_cache_debug_flags+0xc/0x20 cifs_send_recv+0x1f/0x30 SMB2_open+0x22d/0x4b0 ? smb2_open_file+0xd3/0x310 smb2_open_file+0xd3/0x310 cifs_nt_open+0x182/0x280 cifs_open+0x463/0x650 ? __pfx_cifs_open+0x10/0x10 ? do_dentry_open+0x218/0x390 do_dentry_open+0x218/0x390 vfs_open+0x28/0x50 do_open+0x216/0x2c0 path_openat+0x140/0x1b0 do_filp_open+0xb8/0x120 ? kmem_cache_debug_flags+0xc/0x20 ? kmem_cache_alloc_noprof+0x201/0x230 ? getname_flags.part.0+0x24/0x180 do_sys_openat2+0x6e/0xc0 do_sys_open+0x37/0x60 __x64_sys_openat+0x1b/0x30 do_syscall_64+0x80/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-19 19:22 ` Tom Talpey 2025-06-20 12:29 ` David Howells @ 2025-06-23 15:46 ` Stefan Metzmacher 2025-06-23 17:28 ` Steve French 1 sibling, 1 reply; 15+ messages in thread From: Stefan Metzmacher @ 2025-06-23 15:46 UTC (permalink / raw) To: Tom Talpey, linux-cifs; +Cc: Steve French, David Howells Hi Tom, > On 6/18/2025 12:51 PM, Stefan Metzmacher wrote: >> We should not send smbdirect_data_transfer messages larger than >> the negotiated max_send_size, typically 1364 bytes, which means >> 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. >> >> This happened when doing an SMB2 write with more than 1340 bytes >> (which is done inline as it's below rdma_readwrite_threshold). >> >> It means the peer resets the connection. > > Obviously needs fixing but I'm unclear on the proposed change. > See below. > >> Note for stable sp->max_send_size needs to be info->max_send_size: > > So this is important and maybe needs more than this comment, which > is not really something that should go upstream since future stable > kernels won't apply. Recommend deleting this and sending a separate > patch. I can skip it, but I think it might be very useful for the one who needs to do the conflict resolution for the backport. Currently master is the only branch that has 'sp->', so all current backports will need the change. @Steve what would you prefer? Should I remove the hint and conflict resolution diff? In the past I saw something similar in merge requests send to Linus in order to make it easier for him to resolve the git conflicts. As it's preferred to backport fixes from master, I don't think it's a good idea to send a separate patch for the backports. >> @@ -895,7 +895,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, >> .direction = DMA_TO_DEVICE, >> }; >> size_t payload_len = min_t(size_t, *_remaining_data_length, >> - sp->max_send_size - sizeof(*packet)); >> + info->max_send_size - sizeof(*packet)); >> >> rc = smb_extract_iter_to_rdma(iter, payload_len, >> &extract); >> >> cc: Steve French <sfrench@samba.org> >> cc: David Howells <dhowells@redhat.com> >> cc: Tom Talpey <tom@talpey.com> >> cc: linux-cifs@vger.kernel.org >> Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") >> Signed-off-by: Stefan Metzmacher <metze@samba.org> >> --- >> fs/smb/client/smbdirect.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c >> index cbc85bca006f..3a41dcbbff81 100644 >> --- a/fs/smb/client/smbdirect.c >> +++ b/fs/smb/client/smbdirect.c >> @@ -842,7 +842,7 @@ static int smbd_post_send(struct smbd_connection *info, >> static int smbd_post_send_iter(struct smbd_connection *info, >> struct iov_iter *iter, >> - int *_remaining_data_length) >> + unsigned int *_remaining_data_length) >> { >> struct smbdirect_socket *sc = &info->socket; >> struct smbdirect_socket_parameters *sp = &sc->parameters; >> @@ -907,8 +907,10 @@ static int smbd_post_send_iter(struct smbd_connection *info, >> .local_dma_lkey = sc->ib.pd->local_dma_lkey, >> .direction = DMA_TO_DEVICE, >> }; >> + size_t payload_len = min_t(size_t, *_remaining_data_length, >> + sp->max_send_size - sizeof(*packet)); >> - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, >> + rc = smb_extract_iter_to_rdma(iter, payload_len, >> &extract); >> if (rc < 0) >> goto err_dma; >> @@ -970,8 +972,16 @@ static int smbd_post_send_iter(struct smbd_connection *info, >> request->sge[0].lkey = sc->ib.pd->local_dma_lkey; >> rc = smbd_post_send(info, request); >> - if (!rc) >> + if (!rc) { >> + if (iter && iov_iter_count(iter) > 0) { >> + /* >> + * There is more data to send >> + */ >> + goto wait_credit; > > But, shouldn't the caller have done this overflow check, and looped on > the fragments and credits? It seems wrong to push the credit check down > to this level. At least for the caller I guess we want a function that sends the whole iter and smbd_post_send_iter() only gets the iter as argument with an implicit length. To avoid this 'goto wait_credit', we could something like this in the caller: fs/smb/client/smbdirect.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 3a41dcbbff81..e0ba9395ff42 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -2042,17 +2042,24 @@ int smbd_send(struct TCP_Server_Info *server, klen += rqst->rq_iov[i].iov_len; iov_iter_kvec(&iter, ITER_SOURCE, rqst->rq_iov, rqst->rq_nvec, klen); - rc = smbd_post_send_iter(info, &iter, &remaining_data_length); + while (iov_iter_count(&iter) > 0) { + rc = smbd_post_send_iter(info, &iter, + &remaining_data_length); + if (rc < 0) + break; + } if (rc < 0) break; - if (iov_iter_count(&rqst->rq_iter) > 0) { + while (iov_iter_count(&rqst->rq_iter) > 0) { /* And then the data pages if there are any */ rc = smbd_post_send_iter(info, &rqst->rq_iter, &remaining_data_length); if (rc < 0) break; } + if (rc < 0) + break; } while (++rqst_idx < num_rqst); But to me that also doesn't look pretty. Or we rename the current smbd_post_send_iter() to smbd_post_send_iter_chunk() and implement smbd_post_send_iter() as a loop over smbd_post_send_iter_chunk(). I think currently we want a small patch to actually fix the regression. >> + } >> + >> return 0; >> + } >> err_dma: >> for (i = 0; i < request->num_sge; i++) >> @@ -1007,7 +1017,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, >> */ >> static int smbd_post_send_empty(struct smbd_connection *info) >> { >> - int remaining_data_length = 0; >> + unsigned int remaining_data_length = 0; > > Does this fix something?? I guess if I use umin() (as proposed by David) we don't strictly need that change. So I'd prefer to go with skipping the int vs unsigned change and use umin() and keep the of the patch as is. Thanks! metze ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-23 15:46 ` Stefan Metzmacher @ 2025-06-23 17:28 ` Steve French 2025-06-23 19:48 ` Sasha Levin 0 siblings, 1 reply; 15+ messages in thread From: Steve French @ 2025-06-23 17:28 UTC (permalink / raw) To: Stefan Metzmacher, Sasha Levin; +Cc: Tom Talpey, linux-cifs, David Howells Checkpatch will complain if you put a diff in the description so I suspect a link in the mainline patch description to the stable backported version of the patch is better than a diff in the description. Sasha, What is your preference for how to send a patch to mainline, that will not apply to stable, but a rebased version of patch is available for stable? Should the upstream patch have link to the stable rebased version? or just handle it as followup on the stable mailing list? On Mon, Jun 23, 2025 at 10:52 AM Stefan Metzmacher <metze@samba.org> wrote: > > Hi Tom, > > > On 6/18/2025 12:51 PM, Stefan Metzmacher wrote: > >> We should not send smbdirect_data_transfer messages larger than > >> the negotiated max_send_size, typically 1364 bytes, which means > >> 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. > >> > >> This happened when doing an SMB2 write with more than 1340 bytes > >> (which is done inline as it's below rdma_readwrite_threshold). > >> > >> It means the peer resets the connection. > > > > Obviously needs fixing but I'm unclear on the proposed change. > > See below. > > > >> Note for stable sp->max_send_size needs to be info->max_send_size: > > > > So this is important and maybe needs more than this comment, which > > is not really something that should go upstream since future stable > > kernels won't apply. Recommend deleting this and sending a separate > > patch. > > I can skip it, but I think it might be very useful for > the one who needs to do the conflict resolution for the backport. > > Currently master is the only branch that has 'sp->', > so all current backports will need the change. > > @Steve what would you prefer? Should I remove the hint and > conflict resolution diff? In the past I saw something similar > in merge requests send to Linus in order to make it easier for > him to resolve the git conflicts. > > As it's preferred to backport fixes from master, I don't > think it's a good idea to send a separate patch for the backports. > > >> @@ -895,7 +895,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, > >> .direction = DMA_TO_DEVICE, > >> }; > >> size_t payload_len = min_t(size_t, *_remaining_data_length, > >> - sp->max_send_size - sizeof(*packet)); > >> + info->max_send_size - sizeof(*packet)); > >> > >> rc = smb_extract_iter_to_rdma(iter, payload_len, > >> &extract); > >> > >> cc: Steve French <sfrench@samba.org> > >> cc: David Howells <dhowells@redhat.com> > >> cc: Tom Talpey <tom@talpey.com> > >> cc: linux-cifs@vger.kernel.org > >> Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") > >> Signed-off-by: Stefan Metzmacher <metze@samba.org> > >> --- > >> fs/smb/client/smbdirect.c | 18 ++++++++++++++---- > >> 1 file changed, 14 insertions(+), 4 deletions(-) > >> > >> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c > >> index cbc85bca006f..3a41dcbbff81 100644 > >> --- a/fs/smb/client/smbdirect.c > >> +++ b/fs/smb/client/smbdirect.c > >> @@ -842,7 +842,7 @@ static int smbd_post_send(struct smbd_connection *info, > >> static int smbd_post_send_iter(struct smbd_connection *info, > >> struct iov_iter *iter, > >> - int *_remaining_data_length) > >> + unsigned int *_remaining_data_length) > >> { > >> struct smbdirect_socket *sc = &info->socket; > >> struct smbdirect_socket_parameters *sp = &sc->parameters; > >> @@ -907,8 +907,10 @@ static int smbd_post_send_iter(struct smbd_connection *info, > >> .local_dma_lkey = sc->ib.pd->local_dma_lkey, > >> .direction = DMA_TO_DEVICE, > >> }; > >> + size_t payload_len = min_t(size_t, *_remaining_data_length, > >> + sp->max_send_size - sizeof(*packet)); > >> - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, > >> + rc = smb_extract_iter_to_rdma(iter, payload_len, > >> &extract); > >> if (rc < 0) > >> goto err_dma; > >> @@ -970,8 +972,16 @@ static int smbd_post_send_iter(struct smbd_connection *info, > >> request->sge[0].lkey = sc->ib.pd->local_dma_lkey; > >> rc = smbd_post_send(info, request); > >> - if (!rc) > >> + if (!rc) { > >> + if (iter && iov_iter_count(iter) > 0) { > >> + /* > >> + * There is more data to send > >> + */ > >> + goto wait_credit; > > > > But, shouldn't the caller have done this overflow check, and looped on > > the fragments and credits? It seems wrong to push the credit check down > > to this level. > > At least for the caller I guess we want a function that sends > the whole iter and smbd_post_send_iter() only gets the iter as argument > with an implicit length. > > To avoid this 'goto wait_credit', we could something like this in > the caller: > > fs/smb/client/smbdirect.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c > index 3a41dcbbff81..e0ba9395ff42 100644 > --- a/fs/smb/client/smbdirect.c > +++ b/fs/smb/client/smbdirect.c > @@ -2042,17 +2042,24 @@ int smbd_send(struct TCP_Server_Info *server, > klen += rqst->rq_iov[i].iov_len; > iov_iter_kvec(&iter, ITER_SOURCE, rqst->rq_iov, rqst->rq_nvec, klen); > > - rc = smbd_post_send_iter(info, &iter, &remaining_data_length); > + while (iov_iter_count(&iter) > 0) { > + rc = smbd_post_send_iter(info, &iter, > + &remaining_data_length); > + if (rc < 0) > + break; > + } > if (rc < 0) > break; > > - if (iov_iter_count(&rqst->rq_iter) > 0) { > + while (iov_iter_count(&rqst->rq_iter) > 0) { > /* And then the data pages if there are any */ > rc = smbd_post_send_iter(info, &rqst->rq_iter, > &remaining_data_length); > if (rc < 0) > break; > } > + if (rc < 0) > + break; > > } while (++rqst_idx < num_rqst); > > > But to me that also doesn't look pretty. > > Or we rename the current smbd_post_send_iter() to smbd_post_send_iter_chunk() > and implement smbd_post_send_iter() as a loop over smbd_post_send_iter_chunk(). > > I think currently we want a small patch to actually fix the regression. > > >> + } > >> + > >> return 0; > >> + } > >> err_dma: > >> for (i = 0; i < request->num_sge; i++) > >> @@ -1007,7 +1017,7 @@ static int smbd_post_send_iter(struct smbd_connection *info, > >> */ > >> static int smbd_post_send_empty(struct smbd_connection *info) > >> { > >> - int remaining_data_length = 0; > >> + unsigned int remaining_data_length = 0; > > > > Does this fix something?? > > I guess if I use umin() (as proposed by David) we don't strictly need that > change. > > So I'd prefer to go with skipping the int vs unsigned change and use > umin() and keep the of the patch as is. > > Thanks! > metze > -- Thanks, Steve ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-23 17:28 ` Steve French @ 2025-06-23 19:48 ` Sasha Levin 0 siblings, 0 replies; 15+ messages in thread From: Sasha Levin @ 2025-06-23 19:48 UTC (permalink / raw) To: Steve French; +Cc: Stefan Metzmacher, Tom Talpey, linux-cifs, David Howells On Mon, Jun 23, 2025 at 12:28:56PM -0500, Steve French wrote: >Checkpatch will complain if you put a diff in the description so I >suspect a link in the mainline patch description to the stable >backported version of the patch is better than a diff in the >description. > >Sasha, >What is your preference for how to send a patch to mainline, that will >not apply to stable, but a rebased version of patch is available for >stable? Should the upstream patch have link to the stable rebased >version? or just handle it as followup on the stable mailing list? If you want us to not apply something, mark it with: Cc: <stable+noautosel@kernel.org> # reason goes here, and must be present And just send a backport to stable@ -- Thanks, Sasha ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data 2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher 2025-06-19 11:49 ` David Howells 2025-06-19 19:22 ` Tom Talpey @ 2025-06-25 8:00 ` David Howells 2 siblings, 0 replies; 15+ messages in thread From: David Howells @ 2025-06-25 8:00 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: dhowells, linux-cifs, Steve French, Tom Talpey Tested-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-25 8:01 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-18 16:51 [PATCH 0/2] smb: client: fix problems with smbdirect/rdma mounts Stefan Metzmacher 2025-06-18 16:51 ` [PATCH 1/2] smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma() Stefan Metzmacher 2025-06-19 11:41 ` David Howells 2025-06-19 19:07 ` Tom Talpey 2025-06-18 16:51 ` [PATCH 2/2] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data Stefan Metzmacher 2025-06-19 11:49 ` David Howells 2025-06-19 19:22 ` Tom Talpey 2025-06-20 12:29 ` David Howells 2025-06-20 13:33 ` Tom Talpey 2025-06-20 14:56 ` David Howells 2025-06-25 7:59 ` David Howells 2025-06-23 15:46 ` Stefan Metzmacher 2025-06-23 17:28 ` Steve French 2025-06-23 19:48 ` Sasha Levin 2025-06-25 8:00 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox