* Problem with smbdirect rw credits and initiator_depth
@ 2025-12-03 18:18 Stefan Metzmacher
2025-12-04 0:07 ` Namjae Jeon
2025-12-04 9:57 ` Stefan Metzmacher
0 siblings, 2 replies; 24+ messages in thread
From: Stefan Metzmacher @ 2025-12-03 18:18 UTC (permalink / raw)
To: Namjae Jeon, Tom Talpey
Cc: linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org
Hi Namjae,
I found the problem why the 6.17.9 code of transport_rdma.c deadlocks
with a Windows client, when using irdma in roce mode, while the 6.18
code works fine.
irdma/roce in 6.17.9 code => deadlock in wait_for_rw_credits()
[ T8653] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16
[ T8653] ksmbd: smb_direct: max_rw_credits:9
[ T7013] ------------[ cut here ]------------
[ T7013] needed:31 > max:9
[ T7013] WARNING: CPU: 1 PID: 7013 at transport_rdma.c:975 wait_for_credits+0x3b8/0x430 [ksmbd]
When the client starts to send an array with larger number of smb2_buffer_desc_v1
elements in a single SMB2 write request (most likely 31 in the above example)
wait_for_rw_credits() will simply deadlock, as there are only 9 credits possible
and 31 are requested.
In the 6.18 code we have commit 0bd73ae09ba1b73137d0830b21820d24700e09b1
smb: server: allocate enough space for RW WRs and ib_drain_qp()
It makes sure we allocate qp_attr.cap.max_rdma_ctxs and qp_attr.cap.max_send_wr
correct. qp_attr.cap.max_rdma_ctxs was filled by sc->rw_io.credits.max before,
so I changed sc->rw_io.credits.max, but that might need to be split from
each other.
But after that change we no longer deadlock when the client starts sending
larger SMB2 writes, with a larger number of smb2_buffer_desc_v1 elements
it no longer deadlocks, 159 more than enough.
irdma/roce:
[ T6505] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16
...
[ T6505] ksmbd: smb_direct: sc->rw_io.credits.num_pages=13 sc->rw_io.credits.max:159
My current theory about the Mellanox problem is, that the number of pending
RDMA Read operations should be limited by the negotiated initiator_depth, which is at max
SMB_DIRECT_CM_INITIATOR_DEPTH (8). And we're overflowing the hardware limits by
posting too much RDMA Read sqes.
The change in 0bd73ae09ba1b73137d0830b21820d24700e09b1 didn't change the
resulting values of sc->rw_io.credits.max for iwarp devices, it only adjusted
the number for qp_attr.cap.max_send_wr.
So for iwarp we deadlock in both versions of transport_rdma.c, when
the client starts to send an array of 17 of smb2_buffer_desc_v1 elements
(I was able to see that using siw on the server, so that tcpdump was
able to capture it, see:
https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-siw-r6.18-ace-fixed-hang-01-stream13.pcap.gz
With roce it's directly using 17:
https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-rxe-r6.18-race-fixed-rw-credits-reverted-hang-01.pcap.gz
The first few SMB2 writes use 2 smb2_buffer_desc_v1 elements and at the end
the Windows client switches to 17 smb2_buffer_desc_v1 elements.
irdma/iwarp:
[Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:127
..
[Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9
...
[Wed Dec 3 13:45:22 2025] [ T8638] ------------[ cut here ]------------
[Wed Dec 3 13:45:22 2025] [ T8638] needed:17 > max:9
siw/iwarp:
[Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16
...
[Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9
...
[Wed Dec 3 13:49:30 2025] [ T9353] ------------[ cut here ]------------
[Wed Dec 3 13:49:30 2025] [ T9353] needed:17 > max:9
I've prepared 3 branches for testing:
for-6.18/ksmbd-smbdirect-regression-v1
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v1
This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens.
for-6.18/ksmbd-smbdirect-regression-v2
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v2
This is based on for-6.18/ksmbd-smbdirect-regression-v1 but reverts
commit 0bd73ae09ba1b73137d0830b21820d24700e09b1, this might fix your setup.
for-6.18/ksmbd-smbdirect-regression-v3
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v3
This reverts everything to the state of v6.17.9 +
This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens.
Can you please test them with the priority of testing
for-6.18/ksmbd-smbdirect-regression-v2 first and the others if you have
more time.
I typically use this running on a 6.18 kernel:
modprobe ksmbd
ksmbd.control -s
rmmod ksmbd
cd fs/smb/server
make -j$(getconf _NPROCESSORS_ONLN) -C /lib/modules/$(uname -r)/build M=$(pwd) KBUILD_MODPOST_WARN=1 modules
insmod ksmbd.ko
ksmbd.mountd
The in one window:
bpftrace -e 'kprobe:smb_direct_rdma_xmit { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func); }'
And in another window:
dmesg -T -w
I assume the solution is to change smb_direct_rdma_xmit, so that
it doesn't try to get credits for all RDMA read/write requests at once.
Instead after collecting all ib_send_wr structures from all rdma_rw_ctx_wrs()
we chunk the list to stay in the negotiated initiator depth,
before passing to ib_post_send().
At least we need to limit this for RDMA read requests, for RDMA write requests
we may not need to chunk and post them all together, but still chunking might
be good in order to avoid blocking concurrent RDMA sends.
Tom is this assumption correct?
Thanks!
metze
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-03 18:18 Problem with smbdirect rw credits and initiator_depth Stefan Metzmacher @ 2025-12-04 0:07 ` Namjae Jeon 2025-12-04 9:39 ` Stefan Metzmacher 2025-12-04 9:57 ` Stefan Metzmacher 1 sibling, 1 reply; 24+ messages in thread From: Namjae Jeon @ 2025-12-04 0:07 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Hi Metze, Okay, It seems like the issue has been improved in your v3 branch. If you send the official patches, I will test it more. Thanks. On Thu, Dec 4, 2025 at 3:18 AM Stefan Metzmacher <metze@samba.org> wrote: > > Hi Namjae, > > I found the problem why the 6.17.9 code of transport_rdma.c deadlocks > with a Windows client, when using irdma in roce mode, while the 6.18 > code works fine. > > irdma/roce in 6.17.9 code => deadlock in wait_for_rw_credits() > [ T8653] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > [ T8653] ksmbd: smb_direct: max_rw_credits:9 > [ T7013] ------------[ cut here ]------------ > [ T7013] needed:31 > max:9 > [ T7013] WARNING: CPU: 1 PID: 7013 at transport_rdma.c:975 wait_for_credits+0x3b8/0x430 [ksmbd] > > When the client starts to send an array with larger number of smb2_buffer_desc_v1 > elements in a single SMB2 write request (most likely 31 in the above example) > wait_for_rw_credits() will simply deadlock, as there are only 9 credits possible > and 31 are requested. > > In the 6.18 code we have commit 0bd73ae09ba1b73137d0830b21820d24700e09b1 > smb: server: allocate enough space for RW WRs and ib_drain_qp() > > It makes sure we allocate qp_attr.cap.max_rdma_ctxs and qp_attr.cap.max_send_wr > correct. qp_attr.cap.max_rdma_ctxs was filled by sc->rw_io.credits.max before, > so I changed sc->rw_io.credits.max, but that might need to be split from > each other. > > But after that change we no longer deadlock when the client starts sending > larger SMB2 writes, with a larger number of smb2_buffer_desc_v1 elements > it no longer deadlocks, 159 more than enough. > > irdma/roce: > [ T6505] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > ... > [ T6505] ksmbd: smb_direct: sc->rw_io.credits.num_pages=13 sc->rw_io.credits.max:159 > > My current theory about the Mellanox problem is, that the number of pending > RDMA Read operations should be limited by the negotiated initiator_depth, which is at max > SMB_DIRECT_CM_INITIATOR_DEPTH (8). And we're overflowing the hardware limits by > posting too much RDMA Read sqes. > > The change in 0bd73ae09ba1b73137d0830b21820d24700e09b1 didn't change the > resulting values of sc->rw_io.credits.max for iwarp devices, it only adjusted > the number for qp_attr.cap.max_send_wr. > > So for iwarp we deadlock in both versions of transport_rdma.c, when > the client starts to send an array of 17 of smb2_buffer_desc_v1 elements > (I was able to see that using siw on the server, so that tcpdump was > able to capture it, see: > https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-siw-r6.18-ace-fixed-hang-01-stream13.pcap.gz > With roce it's directly using 17: > https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-rxe-r6.18-race-fixed-rw-credits-reverted-hang-01.pcap.gz > > The first few SMB2 writes use 2 smb2_buffer_desc_v1 elements and at the end > the Windows client switches to 17 smb2_buffer_desc_v1 elements. > > irdma/iwarp: > [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:127 > .. > [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 > ... > [Wed Dec 3 13:45:22 2025] [ T8638] ------------[ cut here ]------------ > [Wed Dec 3 13:45:22 2025] [ T8638] needed:17 > max:9 > > > siw/iwarp: > [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > ... > [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 > ... > [Wed Dec 3 13:49:30 2025] [ T9353] ------------[ cut here ]------------ > [Wed Dec 3 13:49:30 2025] [ T9353] needed:17 > max:9 > > I've prepared 3 branches for testing: > > for-6.18/ksmbd-smbdirect-regression-v1 > https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v1 > > This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. > > for-6.18/ksmbd-smbdirect-regression-v2 > https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v2 > > This is based on for-6.18/ksmbd-smbdirect-regression-v1 but reverts > commit 0bd73ae09ba1b73137d0830b21820d24700e09b1, this might fix your setup. > > for-6.18/ksmbd-smbdirect-regression-v3 > https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v3 > > This reverts everything to the state of v6.17.9 + > This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. > > Can you please test them with the priority of testing > for-6.18/ksmbd-smbdirect-regression-v2 first and the others if you have > more time. > > I typically use this running on a 6.18 kernel: > modprobe ksmbd > ksmbd.control -s > rmmod ksmbd > cd fs/smb/server > make -j$(getconf _NPROCESSORS_ONLN) -C /lib/modules/$(uname -r)/build M=$(pwd) KBUILD_MODPOST_WARN=1 modules > insmod ksmbd.ko > ksmbd.mountd > > The in one window: > bpftrace -e 'kprobe:smb_direct_rdma_xmit { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func); }' > And in another window: > dmesg -T -w > > > I assume the solution is to change smb_direct_rdma_xmit, so that > it doesn't try to get credits for all RDMA read/write requests at once. > Instead after collecting all ib_send_wr structures from all rdma_rw_ctx_wrs() > we chunk the list to stay in the negotiated initiator depth, > before passing to ib_post_send(). > > At least we need to limit this for RDMA read requests, for RDMA write requests > we may not need to chunk and post them all together, but still chunking might > be good in order to avoid blocking concurrent RDMA sends. > > Tom is this assumption correct? > > Thanks! > metze > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-04 0:07 ` Namjae Jeon @ 2025-12-04 9:39 ` Stefan Metzmacher 2025-12-05 2:33 ` Namjae Jeon 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-04 9:39 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Hi Namjae, > Okay, It seems like the issue has been improved in your v3 branch. If > you send the official patches, I will test it more. It's good to have verified that for-6.18/ksmbd-smbdirect-regression-v3 on a 6.18 kernel behaves the same as with 6.17.9, as transport_rdma.c is the same, but it doesn't really allow forward process on the Mellanox problem. Can you at least post the dmesg output generated by this: https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 Assuming that this wasn't triggered: if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) Did you run the bpftrace command? Did it print a lot of 'smb_direct_rdma_xmit' message over the whole time of the file copy? Did you actually copied a file to or from the server? Have you actually tested for-6.18/ksmbd-smbdirect-regression-v2, as requested? As I was in hope that it would work in the same way as for-6.18/ksmbd-smbdirect-regression-v3, but with only a single patch reverted. I'll continue to fix the general problem that this works for non Mellanox setups, as it seems it never worked at all :-( Where you testing with RoCEv2 or Infiniband? I think moving forward for Mellanox setups requires these steps: - Test v1 vs. v2 and see that smb_direct_rdma_xmit is actually called at all. And see the dmesg output. - Testing with Mellanox RoCEv2 on the client and rxe on the server, so that we can create a network capture with tcpdump. Thanks! metze > Thanks. > > On Thu, Dec 4, 2025 at 3:18 AM Stefan Metzmacher <metze@samba.org> wrote: >> >> Hi Namjae, >> >> I found the problem why the 6.17.9 code of transport_rdma.c deadlocks >> with a Windows client, when using irdma in roce mode, while the 6.18 >> code works fine. >> >> irdma/roce in 6.17.9 code => deadlock in wait_for_rw_credits() >> [ T8653] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 >> [ T8653] ksmbd: smb_direct: max_rw_credits:9 >> [ T7013] ------------[ cut here ]------------ >> [ T7013] needed:31 > max:9 >> [ T7013] WARNING: CPU: 1 PID: 7013 at transport_rdma.c:975 wait_for_credits+0x3b8/0x430 [ksmbd] >> >> When the client starts to send an array with larger number of smb2_buffer_desc_v1 >> elements in a single SMB2 write request (most likely 31 in the above example) >> wait_for_rw_credits() will simply deadlock, as there are only 9 credits possible >> and 31 are requested. >> >> In the 6.18 code we have commit 0bd73ae09ba1b73137d0830b21820d24700e09b1 >> smb: server: allocate enough space for RW WRs and ib_drain_qp() >> >> It makes sure we allocate qp_attr.cap.max_rdma_ctxs and qp_attr.cap.max_send_wr >> correct. qp_attr.cap.max_rdma_ctxs was filled by sc->rw_io.credits.max before, >> so I changed sc->rw_io.credits.max, but that might need to be split from >> each other. >> >> But after that change we no longer deadlock when the client starts sending >> larger SMB2 writes, with a larger number of smb2_buffer_desc_v1 elements >> it no longer deadlocks, 159 more than enough. >> >> irdma/roce: >> [ T6505] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 >> ... >> [ T6505] ksmbd: smb_direct: sc->rw_io.credits.num_pages=13 sc->rw_io.credits.max:159 >> >> My current theory about the Mellanox problem is, that the number of pending >> RDMA Read operations should be limited by the negotiated initiator_depth, which is at max >> SMB_DIRECT_CM_INITIATOR_DEPTH (8). And we're overflowing the hardware limits by >> posting too much RDMA Read sqes. >> >> The change in 0bd73ae09ba1b73137d0830b21820d24700e09b1 didn't change the >> resulting values of sc->rw_io.credits.max for iwarp devices, it only adjusted >> the number for qp_attr.cap.max_send_wr. >> >> So for iwarp we deadlock in both versions of transport_rdma.c, when >> the client starts to send an array of 17 of smb2_buffer_desc_v1 elements >> (I was able to see that using siw on the server, so that tcpdump was >> able to capture it, see: >> https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-siw-r6.18-ace-fixed-hang-01-stream13.pcap.gz >> With roce it's directly using 17: >> https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-rxe-r6.18-race-fixed-rw-credits-reverted-hang-01.pcap.gz >> >> The first few SMB2 writes use 2 smb2_buffer_desc_v1 elements and at the end >> the Windows client switches to 17 smb2_buffer_desc_v1 elements. >> >> irdma/iwarp: >> [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:127 >> .. >> [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 >> ... >> [Wed Dec 3 13:45:22 2025] [ T8638] ------------[ cut here ]------------ >> [Wed Dec 3 13:45:22 2025] [ T8638] needed:17 > max:9 >> >> >> siw/iwarp: >> [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 >> ... >> [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 >> ... >> [Wed Dec 3 13:49:30 2025] [ T9353] ------------[ cut here ]------------ >> [Wed Dec 3 13:49:30 2025] [ T9353] needed:17 > max:9 >> >> I've prepared 3 branches for testing: >> >> for-6.18/ksmbd-smbdirect-regression-v1 >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v1 >> >> This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. >> >> for-6.18/ksmbd-smbdirect-regression-v2 >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v2 >> >> This is based on for-6.18/ksmbd-smbdirect-regression-v1 but reverts >> commit 0bd73ae09ba1b73137d0830b21820d24700e09b1, this might fix your setup. >> >> for-6.18/ksmbd-smbdirect-regression-v3 >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v3 >> >> This reverts everything to the state of v6.17.9 + >> This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. >> >> Can you please test them with the priority of testing >> for-6.18/ksmbd-smbdirect-regression-v2 first and the others if you have >> more time. >> >> I typically use this running on a 6.18 kernel: >> modprobe ksmbd >> ksmbd.control -s >> rmmod ksmbd >> cd fs/smb/server >> make -j$(getconf _NPROCESSORS_ONLN) -C /lib/modules/$(uname -r)/build M=$(pwd) KBUILD_MODPOST_WARN=1 modules >> insmod ksmbd.ko >> ksmbd.mountd >> >> The in one window: >> bpftrace -e 'kprobe:smb_direct_rdma_xmit { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func); }' >> And in another window: >> dmesg -T -w >> >> >> I assume the solution is to change smb_direct_rdma_xmit, so that >> it doesn't try to get credits for all RDMA read/write requests at once. >> Instead after collecting all ib_send_wr structures from all rdma_rw_ctx_wrs() >> we chunk the list to stay in the negotiated initiator depth, >> before passing to ib_post_send(). >> >> At least we need to limit this for RDMA read requests, for RDMA write requests >> we may not need to chunk and post them all together, but still chunking might >> be good in order to avoid blocking concurrent RDMA sends. >> >> Tom is this assumption correct? >> >> Thanks! >> metze >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-04 9:39 ` Stefan Metzmacher @ 2025-12-05 2:33 ` Namjae Jeon 2025-12-05 12:21 ` Namjae Jeon 2025-12-08 16:02 ` Stefan Metzmacher 0 siblings, 2 replies; 24+ messages in thread From: Namjae Jeon @ 2025-12-05 2:33 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org On Thu, Dec 4, 2025 at 6:40 PM Stefan Metzmacher <metze@samba.org> wrote: > > Hi Namjae, Hi Metze, > > > Okay, It seems like the issue has been improved in your v3 branch. If > > you send the official patches, I will test it more. > > It's good to have verified that for-6.18/ksmbd-smbdirect-regression-v3 > on a 6.18 kernel behaves the same as with 6.17.9, as transport_rdma.c > is the same, but it doesn't really allow forward process on > the Mellanox problem. > > Can you at least post the dmesg output generated by this: > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 > Assuming that this wasn't triggered: > if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) I didn't know you wanted it. I will share it after office. > > Did you run the bpftrace command? Did it print a lot of > 'smb_direct_rdma_xmit' message over the whole time of the file copy? No, I didn't check it. but I will try this. > > Did you actually copied a file to or from the server? nod. > > Have you actually tested for-6.18/ksmbd-smbdirect-regression-v2, > as requested? As I was in hope that it would work in the > same way as for-6.18/ksmbd-smbdirect-regression-v3, > but with only a single patch reverted. I tested the v2 patch and the same issues still occurred, but they are gone in v3. > > I'll continue to fix the general problem that this works > for non Mellanox setups, as it seems it never worked at all :-( Smbdirect should work well on Mellanox NICs. As I said before, most people use this. I've rarely seen ksmbd users use smbdirect with non-Mellanox NICs. If you want to have a stable, long-term smbdirect feature on Samba, you'll need to have this device. > > Where you testing with RoCEv2 or Infiniband? RoCEv2 > > I think moving forward for Mellanox setups requires these steps: > - Test v1 vs. v2 and see that smb_direct_rdma_xmit is actually > called at all. And see the dmesg output. > - Testing with Mellanox RoCEv2 on the client and rxe on > the server, so that we can create a network capture with tcpdump. Okay. Thanks. > > Thanks! > metze > > > Thanks. > > > > On Thu, Dec 4, 2025 at 3:18 AM Stefan Metzmacher <metze@samba.org> wrote: > >> > >> Hi Namjae, > >> > >> I found the problem why the 6.17.9 code of transport_rdma.c deadlocks > >> with a Windows client, when using irdma in roce mode, while the 6.18 > >> code works fine. > >> > >> irdma/roce in 6.17.9 code => deadlock in wait_for_rw_credits() > >> [ T8653] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > >> [ T8653] ksmbd: smb_direct: max_rw_credits:9 > >> [ T7013] ------------[ cut here ]------------ > >> [ T7013] needed:31 > max:9 > >> [ T7013] WARNING: CPU: 1 PID: 7013 at transport_rdma.c:975 wait_for_credits+0x3b8/0x430 [ksmbd] > >> > >> When the client starts to send an array with larger number of smb2_buffer_desc_v1 > >> elements in a single SMB2 write request (most likely 31 in the above example) > >> wait_for_rw_credits() will simply deadlock, as there are only 9 credits possible > >> and 31 are requested. > >> > >> In the 6.18 code we have commit 0bd73ae09ba1b73137d0830b21820d24700e09b1 > >> smb: server: allocate enough space for RW WRs and ib_drain_qp() > >> > >> It makes sure we allocate qp_attr.cap.max_rdma_ctxs and qp_attr.cap.max_send_wr > >> correct. qp_attr.cap.max_rdma_ctxs was filled by sc->rw_io.credits.max before, > >> so I changed sc->rw_io.credits.max, but that might need to be split from > >> each other. > >> > >> But after that change we no longer deadlock when the client starts sending > >> larger SMB2 writes, with a larger number of smb2_buffer_desc_v1 elements > >> it no longer deadlocks, 159 more than enough. > >> > >> irdma/roce: > >> [ T6505] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > >> ... > >> [ T6505] ksmbd: smb_direct: sc->rw_io.credits.num_pages=13 sc->rw_io.credits.max:159 > >> > >> My current theory about the Mellanox problem is, that the number of pending > >> RDMA Read operations should be limited by the negotiated initiator_depth, which is at max > >> SMB_DIRECT_CM_INITIATOR_DEPTH (8). And we're overflowing the hardware limits by > >> posting too much RDMA Read sqes. > >> > >> The change in 0bd73ae09ba1b73137d0830b21820d24700e09b1 didn't change the > >> resulting values of sc->rw_io.credits.max for iwarp devices, it only adjusted > >> the number for qp_attr.cap.max_send_wr. > >> > >> So for iwarp we deadlock in both versions of transport_rdma.c, when > >> the client starts to send an array of 17 of smb2_buffer_desc_v1 elements > >> (I was able to see that using siw on the server, so that tcpdump was > >> able to capture it, see: > >> https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-siw-r6.18-ace-fixed-hang-01-stream13.pcap.gz > >> With roce it's directly using 17: > >> https://www.samba.org/~metze/caps/smb2/rdma/linux-6.18-regression/2025-12-03/rdma1-rxe-r6.18-race-fixed-rw-credits-reverted-hang-01.pcap.gz > >> > >> The first few SMB2 writes use 2 smb2_buffer_desc_v1 elements and at the end > >> the Windows client switches to 17 smb2_buffer_desc_v1 elements. > >> > >> irdma/iwarp: > >> [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:127 > >> .. > >> [Wed Dec 3 13:45:22 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 > >> ... > >> [Wed Dec 3 13:45:22 2025] [ T8638] ------------[ cut here ]------------ > >> [Wed Dec 3 13:45:22 2025] [ T8638] needed:17 > max:9 > >> > >> > >> siw/iwarp: > >> [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 > >> ... > >> [Wed Dec 3 13:49:30 2025] [ T7621] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 > >> ... > >> [Wed Dec 3 13:49:30 2025] [ T9353] ------------[ cut here ]------------ > >> [Wed Dec 3 13:49:30 2025] [ T9353] needed:17 > max:9 > >> > >> I've prepared 3 branches for testing: > >> > >> for-6.18/ksmbd-smbdirect-regression-v1 > >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v1 > >> > >> This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. > >> > >> for-6.18/ksmbd-smbdirect-regression-v2 > >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v2 > >> > >> This is based on for-6.18/ksmbd-smbdirect-regression-v1 but reverts > >> commit 0bd73ae09ba1b73137d0830b21820d24700e09b1, this might fix your setup. > >> > >> for-6.18/ksmbd-smbdirect-regression-v3 > >> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-6.18/ksmbd-smbdirect-regression-v3 > >> > >> This reverts everything to the state of v6.17.9 + > >> This has some pr_notice() messages and a WARN_ONCE() when the wait_for_rw_credits() happens. > >> > >> Can you please test them with the priority of testing > >> for-6.18/ksmbd-smbdirect-regression-v2 first and the others if you have > >> more time. > >> > >> I typically use this running on a 6.18 kernel: > >> modprobe ksmbd > >> ksmbd.control -s > >> rmmod ksmbd > >> cd fs/smb/server > >> make -j$(getconf _NPROCESSORS_ONLN) -C /lib/modules/$(uname -r)/build M=$(pwd) KBUILD_MODPOST_WARN=1 modules > >> insmod ksmbd.ko > >> ksmbd.mountd > >> > >> The in one window: > >> bpftrace -e 'kprobe:smb_direct_rdma_xmit { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func); }' > >> And in another window: > >> dmesg -T -w > >> > >> > >> I assume the solution is to change smb_direct_rdma_xmit, so that > >> it doesn't try to get credits for all RDMA read/write requests at once. > >> Instead after collecting all ib_send_wr structures from all rdma_rw_ctx_wrs() > >> we chunk the list to stay in the negotiated initiator depth, > >> before passing to ib_post_send(). > >> > >> At least we need to limit this for RDMA read requests, for RDMA write requests > >> we may not need to chunk and post them all together, but still chunking might > >> be good in order to avoid blocking concurrent RDMA sends. > >> > >> Tom is this assumption correct? > >> > >> Thanks! > >> metze > >> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-05 2:33 ` Namjae Jeon @ 2025-12-05 12:21 ` Namjae Jeon 2025-12-08 16:13 ` Stefan Metzmacher 2025-12-10 16:42 ` Stefan Metzmacher 2025-12-08 16:02 ` Stefan Metzmacher 1 sibling, 2 replies; 24+ messages in thread From: Namjae Jeon @ 2025-12-05 12:21 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 792 bytes --] > > Can you at least post the dmesg output generated by this: > > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 > > Assuming that this wasn't triggered: > > if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) > I didn't know you wanted it. I will share it after office. I have attached v2 and v3 logs. Let me know if you need something more, > > > > Did you run the bpftrace command? Did it print a lot of > > 'smb_direct_rdma_xmit' message over the whole time of the file copy? > No, I didn't check it. but I will try this. /mnt# bpftrace ksmbd-rdma-xmit.bt Attaching 1 probe... The absence of any output after Attaching 1 probe... indicates that the smb_direct_rdma_xmit function has not been called ? [-- Attachment #2: wip_v2.txt --] [-- Type: text/plain, Size: 5782 bytes --] [ 414.784569] ksmbd: running [ 788.183310] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 788.183334] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 788.183337] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 788.183339] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 788.183342] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 788.193186] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 788.193209] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 788.193213] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 788.193216] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 788.193219] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 788.199731] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 788.199756] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 788.199760] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 788.199763] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 788.199766] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 788.208144] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 788.208162] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 788.208163] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 788.208165] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 788.208166] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 791.502337] ksmbd: smb_direct: disconnected [ 791.502344] ksmbd: Failed to send message: -107 [ 791.502354] ksmbd: sock_read failed: -107 [ 791.502378] ksmbd: Failed to send message: -107 [ 791.502401] ksmbd: Failed to send message: -107 [ 791.502421] ksmbd: Failed to send message: -107 [ 791.503407] ksmbd: Failed to send message: -107 [ 791.503438] ksmbd: Failed to send message: -107 [ 791.503462] ksmbd: Failed to send message: -107 [ 791.503820] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 791.504756] ksmbd: Failed to send message: -107 [ 791.504776] ksmbd: Failed to send message: -107 [ 791.504793] ksmbd: Failed to send message: -107 [ 791.504811] ksmbd: Failed to send message: -107 [ 791.504828] ksmbd: Failed to send message: -107 [ 791.504844] ksmbd: Failed to send message: -107 [ 791.504878] ksmbd: Failed to send message: -107 [ 791.504901] ksmbd: Failed to send message: -107 [ 791.567962] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 791.568044] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 791.568051] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 791.568058] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 791.568064] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 801.925281] ksmbd: Failed to send message: -107 [ 801.925365] ksmbd: Failed to send message: -107 [ 801.925392] ksmbd: smb_direct: disconnected [ 801.925399] ksmbd: sock_read failed: -107 [ 801.926651] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 801.926665] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 801.927717] ksmbd: Failed to send message: -107 [ 801.927738] ksmbd: Failed to send message: -107 [ 801.935940] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 801.935982] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 801.935988] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 801.935993] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 801.935999] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 805.259137] ksmbd: smb_direct: disconnected [ 805.259153] ksmbd: sock_read failed: -107 [ 805.259165] ksmbd: Failed to send message: -107 [ 805.259215] ksmbd: Failed to send message: -107 [ 805.259748] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 805.259767] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 805.264034] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 805.264256] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 805.264264] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 [ 805.264269] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 805.264275] ksmbd: smb_direct: max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [-- Attachment #3: wip_v3.txt --] [-- Type: text/plain, Size: 2110 bytes --] [ 914.670491] ksmbd: running [ 978.636240] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 978.636258] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 978.636261] ksmbd: smb_direct: max_rdma_rw_size:8388608 pages_per_rw_credit:256 max_rw_credits:9 [ 978.636263] ksmbd: smb_direct: max_send_sges:4 max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 978.655836] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 978.655863] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 978.655867] ksmbd: smb_direct: max_rdma_rw_size:8388608 pages_per_rw_credit:256 max_rw_credits:9 [ 978.655871] ksmbd: smb_direct: max_send_sges:4 max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 978.667671] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 978.667697] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 978.667701] ksmbd: smb_direct: max_rdma_rw_size:8388608 pages_per_rw_credit:256 max_rw_credits:9 [ 978.667705] ksmbd: smb_direct: max_send_sges:4 max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 [ 978.688873] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e [ 978.688894] ksmbd: smb_direct: initiator_depth:8 peer_initiator_depth:16 [ 978.688896] ksmbd: smb_direct: max_rdma_rw_size:8388608 pages_per_rw_credit:256 max_rw_credits:9 [ 978.688898] ksmbd: smb_direct: max_send_sges:4 max_sge_per_wr:30 wrs_per_credit:10 max_rw_wrs:90 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-05 12:21 ` Namjae Jeon @ 2025-12-08 16:13 ` Stefan Metzmacher 2025-12-10 16:42 ` Stefan Metzmacher 1 sibling, 0 replies; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-08 16:13 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 05.12.25 um 13:21 schrieb Namjae Jeon: >>> Can you at least post the dmesg output generated by this: >>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >>> Assuming that this wasn't triggered: >>> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) >> I didn't know you wanted it. I will share it after office. > I have attached v2 and v3 logs. Let me know if you need something more, Thanks! A difference is that max_sgl_rd=3 is non 0, so it likely takes a different code path for compared to irdma (roce) and rxe. This can also be forced with the force_mr=1 module parameter of ib_core... It would be good to see captures while testing with rxe for ksmbd. >>> >>> Did you run the bpftrace command? Did it print a lot of >>> 'smb_direct_rdma_xmit' message over the whole time of the file copy? >> No, I didn't check it. but I will try this. > /mnt# bpftrace ksmbd-rdma-xmit.bt > Attaching 1 probe... > > The absence of any output after Attaching 1 probe... indicates that > the smb_direct_rdma_xmit function has not been called ? It seems to, does the client has smb signing required? I tested ksmbd with 'server signing = mandatory' and as far as I remember copying a 5GB iso from and to the server worked fine. I used this bpftrace script to show that smbdirect was really used, but without RDMA read/write offload: kprobe:smb_direct_rdma_xmit { printf("%s: %s pid=%d %s: BEGIN\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } kretprobe:smb_direct_rdma_xmit { //printf("%s: %s pid=%d %s: RETURN\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } kprobe:read_done { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } kprobe:write_done { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } kprobe:smb2_write { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } kprobe:smb2_read { printf("%s: %s pid=%d %s\n", strftime("%F %H:%M:%S", nsecs(sw_tai)), comm, pid, func) } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-05 12:21 ` Namjae Jeon 2025-12-08 16:13 ` Stefan Metzmacher @ 2025-12-10 16:42 ` Stefan Metzmacher 2025-12-11 19:38 ` Stefan Metzmacher 1 sibling, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-10 16:42 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 05.12.25 um 13:21 schrieb Namjae Jeon: >>> Can you at least post the dmesg output generated by this: >>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >>> Assuming that this wasn't triggered: >>> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) >> I didn't know you wanted it. I will share it after office. > I have attached v2 and v3 logs. Let me know if you need something more, >>> >>> Did you run the bpftrace command? Did it print a lot of >>> 'smb_direct_rdma_xmit' message over the whole time of the file copy? >> No, I didn't check it. but I will try this. > /mnt# bpftrace ksmbd-rdma-xmit.bt > Attaching 1 probe... > > The absence of any output after Attaching 1 probe... indicates that > the smb_direct_rdma_xmit function has not been called ? Assuming the client requires signing, I may found the reason for a recv credit problem. ksmbd uses this: smb_direct_max_fragmented_recv_size = 1024 * 1024 smb_direct_max_receive_size = 1364; smb_direct_receive_credit_max = 255; In order for the client to fill the full eassembly buffer, all our recv buffers are moved into it, which means 255 * (1364 - 24) = 341700 (0x536C4) bytes of payload, after that we no longer able to grant and new recv credits to the peer, which tries to send up to 1048576 (0x100000). I found this using smbclient to download a large file from a Windows server without using rdma offload. So I guess you are seeing the problem when Windows tries to copy a file to ksmbd. For smbclient I made it work by changing max_fragmented_recv_size to the minimum value of 131072 (0x20000), this value is smaller than all local recv buffers 255 * (1364 - 24) = 341700 (0x536C4). I try to find what difference we have between 6.17.9 and 6.18 tomorrow. In the meantime you may want to test if 6.18 with smb_direct_max_fragmented_recv_size = 131072 works for you, or change smb_direct_receive_credit_max = 1024. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-10 16:42 ` Stefan Metzmacher @ 2025-12-11 19:38 ` Stefan Metzmacher 2025-12-12 9:58 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-11 19:38 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 10.12.25 um 17:42 schrieb Stefan Metzmacher: > Am 05.12.25 um 13:21 schrieb Namjae Jeon: >>>> Can you at least post the dmesg output generated by this: >>>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >>>> Assuming that this wasn't triggered: >>>> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) >>> I didn't know you wanted it. I will share it after office. >> I have attached v2 and v3 logs. Let me know if you need something more, >>>> >>>> Did you run the bpftrace command? Did it print a lot of >>>> 'smb_direct_rdma_xmit' message over the whole time of the file copy? >>> No, I didn't check it. but I will try this. >> /mnt# bpftrace ksmbd-rdma-xmit.bt >> Attaching 1 probe... >> >> The absence of any output after Attaching 1 probe... indicates that >> the smb_direct_rdma_xmit function has not been called ? > > Assuming the client requires signing, I may found the > reason for a recv credit problem. > > ksmbd uses this: > > smb_direct_max_fragmented_recv_size = 1024 * 1024 > smb_direct_max_receive_size = 1364; > smb_direct_receive_credit_max = 255; > > In order for the client to fill the full eassembly buffer, > all our recv buffers are moved into it, which means > 255 * (1364 - 24) = 341700 (0x536C4) bytes of payload, > after that we no longer able to grant and new recv credits to > the peer, which tries to send up to 1048576 (0x100000). > > I found this using smbclient to download a large file > from a Windows server without using rdma offload. > > So I guess you are seeing the problem when Windows > tries to copy a file to ksmbd. > > For smbclient I made it work by changing > max_fragmented_recv_size to the minimum value of > 131072 (0x20000), this value is smaller than > all local recv buffers 255 * (1364 - 24) = 341700 (0x536C4). > > I try to find what difference we have between 6.17.9 > and 6.18 tomorrow. The above is not a problem with 6.17.9 nor with 6.18 in the server, as I found this logic hiding in smb_direct_prepare(). sp->max_fragmented_recv_size = (sp->recv_credit_max * sp->max_recv_size) / 2; It explains why I saw the strange 173910 value in captures... But this is broken in the client as it doesn't have this logic. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-11 19:38 ` Stefan Metzmacher @ 2025-12-12 9:58 ` Stefan Metzmacher 2025-12-12 15:35 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-12 9:58 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 11.12.25 um 20:38 schrieb Stefan Metzmacher: > Am 10.12.25 um 17:42 schrieb Stefan Metzmacher: >> Am 05.12.25 um 13:21 schrieb Namjae Jeon: >>>>> Can you at least post the dmesg output generated by this: >>>>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >>>>> Assuming that this wasn't triggered: >>>>> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) >>>> I didn't know you wanted it. I will share it after office. >>> I have attached v2 and v3 logs. Let me know if you need something more, >>>>> >>>>> Did you run the bpftrace command? Did it print a lot of >>>>> 'smb_direct_rdma_xmit' message over the whole time of the file copy? >>>> No, I didn't check it. but I will try this. >>> /mnt# bpftrace ksmbd-rdma-xmit.bt >>> Attaching 1 probe... >>> >>> The absence of any output after Attaching 1 probe... indicates that >>> the smb_direct_rdma_xmit function has not been called ? >> >> Assuming the client requires signing, I may found the >> reason for a recv credit problem. >> >> ksmbd uses this: >> >> smb_direct_max_fragmented_recv_size = 1024 * 1024 >> smb_direct_max_receive_size = 1364; >> smb_direct_receive_credit_max = 255; >> >> In order for the client to fill the full eassembly buffer, >> all our recv buffers are moved into it, which means >> 255 * (1364 - 24) = 341700 (0x536C4) bytes of payload, >> after that we no longer able to grant and new recv credits to >> the peer, which tries to send up to 1048576 (0x100000). >> >> I found this using smbclient to download a large file >> from a Windows server without using rdma offload. >> >> So I guess you are seeing the problem when Windows >> tries to copy a file to ksmbd. >> >> For smbclient I made it work by changing >> max_fragmented_recv_size to the minimum value of >> 131072 (0x20000), this value is smaller than >> all local recv buffers 255 * (1364 - 24) = 341700 (0x536C4). >> >> I try to find what difference we have between 6.17.9 >> and 6.18 tomorrow. > > The above is not a problem with 6.17.9 nor > with 6.18 in the server, as I found this logic > hiding in smb_direct_prepare(). > > sp->max_fragmented_recv_size = > (sp->recv_credit_max * sp->max_recv_size) / 2; > > It explains why I saw the strange 173910 value in captures... It also explains why the branch I proposed for 6.19 was worse than 6.18, as this logic got lost. Today I tested with smbclient downloading a 32MB file from ksmbd (in the state of 6.17.9, basically for-6.18/ksmbd-smbdirect-regression-v3) and was able to generate a problem that seems to be verify similar as what you are seeing with 6.18 and the Mellanox setup. During the stream of fragments ksmbd sends a pdu a keepalive pdu (RemainingLength = 0 and DataLength = 0) granting credits, this truncates the smb2 read response. Maybe a7eef6144c97bd7031d40ebc6e8fdd038ea3f46f smb: server: queue post_recv_credits_work in put_recvmsg() and avoid count_avail_recvmsg makes it more likely to happen, but I'm exploring that. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-12 9:58 ` Stefan Metzmacher @ 2025-12-12 15:35 ` Stefan Metzmacher 2025-12-13 2:14 ` Namjae Jeon 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-12 15:35 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 12.12.25 um 10:58 schrieb Stefan Metzmacher: > Am 11.12.25 um 20:38 schrieb Stefan Metzmacher: >> Am 10.12.25 um 17:42 schrieb Stefan Metzmacher: >>> Am 05.12.25 um 13:21 schrieb Namjae Jeon: >>>>>> Can you at least post the dmesg output generated by this: >>>>>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >>>>>> Assuming that this wasn't triggered: >>>>>> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) >>>>> I didn't know you wanted it. I will share it after office. >>>> I have attached v2 and v3 logs. Let me know if you need something more, >>>>>> >>>>>> Did you run the bpftrace command? Did it print a lot of >>>>>> 'smb_direct_rdma_xmit' message over the whole time of the file copy? >>>>> No, I didn't check it. but I will try this. >>>> /mnt# bpftrace ksmbd-rdma-xmit.bt >>>> Attaching 1 probe... >>>> >>>> The absence of any output after Attaching 1 probe... indicates that >>>> the smb_direct_rdma_xmit function has not been called ? >>> >>> Assuming the client requires signing, I may found the >>> reason for a recv credit problem. >>> >>> ksmbd uses this: >>> >>> smb_direct_max_fragmented_recv_size = 1024 * 1024 >>> smb_direct_max_receive_size = 1364; >>> smb_direct_receive_credit_max = 255; >>> >>> In order for the client to fill the full eassembly buffer, >>> all our recv buffers are moved into it, which means >>> 255 * (1364 - 24) = 341700 (0x536C4) bytes of payload, >>> after that we no longer able to grant and new recv credits to >>> the peer, which tries to send up to 1048576 (0x100000). >>> >>> I found this using smbclient to download a large file >>> from a Windows server without using rdma offload. >>> >>> So I guess you are seeing the problem when Windows >>> tries to copy a file to ksmbd. >>> >>> For smbclient I made it work by changing >>> max_fragmented_recv_size to the minimum value of >>> 131072 (0x20000), this value is smaller than >>> all local recv buffers 255 * (1364 - 24) = 341700 (0x536C4). >>> >>> I try to find what difference we have between 6.17.9 >>> and 6.18 tomorrow. >> >> The above is not a problem with 6.17.9 nor >> with 6.18 in the server, as I found this logic >> hiding in smb_direct_prepare(). >> >> sp->max_fragmented_recv_size = >> (sp->recv_credit_max * sp->max_recv_size) / 2; >> >> It explains why I saw the strange 173910 value in captures... > > It also explains why the branch I proposed for 6.19 > was worse than 6.18, as this logic got lost. > > Today I tested with smbclient downloading a 32MB file > from ksmbd (in the state of 6.17.9, basically > for-6.18/ksmbd-smbdirect-regression-v3) > and was able to generate a problem that seems to > be verify similar as what you are seeing with 6.18 > and the Mellanox setup. > > During the stream of fragments ksmbd sends a pdu > a keepalive pdu (RemainingLength = 0 and DataLength = 0) > granting credits, this truncates the smb2 read response. I've fixed like this: fs/smb/common/smbdirect/smbdirect_socket.h | 4 ++++ fs/smb/server/transport_rdma.c | 14 ++++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/smb/common/smbdirect/smbdirect_socket.h b/fs/smb/common/smbdirect/smbdirect_socket.h index ee4c2726771a..c541c9d0ae2d 100644 --- a/fs/smb/common/smbdirect/smbdirect_socket.h +++ b/fs/smb/common/smbdirect/smbdirect_socket.h @@ -178,6 +178,10 @@ struct smbdirect_socket { wait_queue_head_t wait_queue; } credits; + struct { + u32 remaining_data_length; + } batch; + /* * The state about posted/pending sends */ diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c index 03944be02b14..cc0d647aa9df 100644 --- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -360,11 +360,16 @@ static void smb_direct_send_immediate_work(struct work_struct *work) { struct smbdirect_socket *sc = container_of(work, struct smbdirect_socket, idle.immediate_work); + int ret; if (sc->status != SMBDIRECT_SOCKET_CONNECTED) return; - smb_direct_post_send_data(sc, NULL, NULL, 0, 0); + ret = smb_direct_post_send_data(sc, NULL, NULL, 0, 0); + if (ret == -EBUSY) { + pr_notice("%s: skipped batch running:%u\n", + __func__, sc->send_io.batch.remaining_data_length); + } } static void smb_direct_idle_connection_timer(struct work_struct *work) @@ -1031,7 +1036,7 @@ static void smb_direct_post_recv_credits(struct work_struct *work) } } - if (credits) + if (credits && !sc->send_io.batch.remaining_data_length) queue_work(sc->workqueue, &sc->idle.immediate_work); } @@ -1427,6 +1432,11 @@ static int smb_direct_post_send_data(struct smbdirect_socket *sc, int data_length; struct scatterlist sg[SMBDIRECT_SEND_IO_MAX_SGE - 1]; + if (send_ctx) + sc->send_io.batch.remaining_data_length = remaining_data_length; + else if (sc->send_io.batch.remaining_data_length) + return -EBUSY; + ret = wait_for_send_lcredit(sc, send_ctx); if (ret) goto lcredit_failed; > Maybe a7eef6144c97bd7031d40ebc6e8fdd038ea3f46f > smb: server: queue post_recv_credits_work in put_recvmsg() and avoid count_avail_recvmsg > makes it more likely to happen, but I'm exploring that. This was not the reason for the response truncation, but most likely caused the problem you are hitting as the logic for queue_work(smb_direct_wq, &t->post_recv_credits_work); in recv_done() was too strictly and not posting more recv buffers. I fixed it with this: --- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -645,6 +645,7 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) struct smbdirect_data_transfer *data_transfer = (struct smbdirect_data_transfer *)recvmsg->packet; u32 remaining_data_length, data_offset, data_length; + int current_recv_credits; u16 old_recv_credit_target; if (wc->byte_len < @@ -683,7 +684,7 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) } atomic_dec(&sc->recv_io.posted.count); - atomic_dec(&sc->recv_io.credits.count); + current_recv_credits = atomic_dec_return(&sc->recv_io.credits.count); old_recv_credit_target = sc->recv_io.credits.target; sc->recv_io.credits.target = @@ -703,7 +704,8 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc) wake_up(&sc->send_io.credits.wait_queue); if (data_length) { - if (sc->recv_io.credits.target > old_recv_credit_target) + if (current_recv_credits <= (sc->recv_io.credits.target / 4) || + sc->recv_io.credits.target > old_recv_credit_target) queue_work(sc->workqueue, &sc->recv_io.posted.refill_work); enqueue_reassembly(sc, recvmsg, (int)data_length); This seems to be similar than the logic in Windows it grants 191 credits at once the peer is low on credits. 255 / 4 = ~64 and 255 - 64 = 191 I've put these changes a long with rw credit fixes into my for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to test this? If that works I'll prepare real patches... Note there's also a for-6.18/ksmbd-smbdirect-regression-v4+ branch, but that's only for my own usage with my debug kernel that has some backports required for the IPPROTO_SMBDIRECT patches... metze ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-12 15:35 ` Stefan Metzmacher @ 2025-12-13 2:14 ` Namjae Jeon 2025-12-14 22:56 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Namjae Jeon @ 2025-12-13 2:14 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org > I've put these changes a long with rw credit fixes into my > for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to > test this? Problems still occur. See: [ 5734.595709] ksmbd: running [ 5872.277551] ksmbd: smb_direct: dev[rocep1s0f0]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 5872.277575] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 5872.277606] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 5872.277612] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 5872.277619] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 5872.278221] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 5872.294920] ksmbd: smb_direct: dev[rocep1s0f0]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 5872.294929] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 5872.294941] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 5872.294942] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 5872.294956] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 5872.295110] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 5872.295116] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 5872.295125] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 5872.295126] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 5872.295128] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 5872.295144] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 5872.295276] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 5872.301380] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 5872.301386] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 5872.301395] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 5872.301396] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 5872.301398] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 5872.301536] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 5887.761125] ksmbd: smb_direct: disconnected [ 5887.762410] ksmbd: Failed to send message: -107 [ 5887.762586] ksmbd: Failed to send message: -107 [ 5887.762775] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 5887.762794] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 5887.762830] ksmbd: Failed to send message: -107 [ 5887.762860] ksmbd: Failed to send message: -107 [ 5887.762888] ksmbd: Failed to send message: -107 [ 5887.762913] ksmbd: Failed to send message: -107 [ 5887.762967] ksmbd: Failed to send message: -107 [ 5887.763042] ksmbd: Failed to send message: -107 [ 5887.765363] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 5887.765385] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 5887.765416] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 5887.765422] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 5887.765428] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 5887.765919] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 > > If that works I'll prepare real patches... > > Note there's also a for-6.18/ksmbd-smbdirect-regression-v4+ branch, > but that's only for my own usage with my debug kernel that has some > backports required for the IPPROTO_SMBDIRECT patches... > > metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-13 2:14 ` Namjae Jeon @ 2025-12-14 22:56 ` Stefan Metzmacher 2025-12-15 20:17 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-14 22:56 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 13.12.25 um 03:14 schrieb Namjae Jeon: >> I've put these changes a long with rw credit fixes into my >> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >> test this? > Problems still occur. See: :-( Would you be able to use rxe and cake a network capture? Using test files with all zeros, e.g. dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 would allow gzip --best on the capture file to compress well... metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-14 22:56 ` Stefan Metzmacher @ 2025-12-15 20:17 ` Stefan Metzmacher 2025-12-16 23:59 ` Namjae Jeon 2026-01-14 18:13 ` Stefan Metzmacher 0 siblings, 2 replies; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-15 20:17 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: > Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>> I've put these changes a long with rw credit fixes into my >>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>> test this? >> Problems still occur. See: > > :-( Would you be able to use rxe and cake a network capture? > > Using test files with all zeros, e.g. > dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 > would allow gzip --best on the capture file to compress well... I think I found something that explains it and I was able to reproduce and what I have in mind. We increment recv_io.posted.count after ib_post_recv() And manage_credits_prior_sending() uses new_credits = recv_io.posted.count - recv_io.credits.count But there is a race between the hardware receiving a message and recv_done being called in order to decrement recv_io.posted.count again. During that race manage_credits_prior_sending() might grant too much credits. Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, I haven't tested this branch yet, I'm running out of time for the day. But I tested it with smbclient and having a similar logic in fs/smb/common/smbdirect/smbdirect_connection.c metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-15 20:17 ` Stefan Metzmacher @ 2025-12-16 23:59 ` Namjae Jeon 2026-01-14 18:13 ` Stefan Metzmacher 1 sibling, 0 replies; 24+ messages in thread From: Namjae Jeon @ 2025-12-16 23:59 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org On Tue, Dec 16, 2025 at 5:17 AM Stefan Metzmacher <metze@samba.org> wrote: > > Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: > > Am 13.12.25 um 03:14 schrieb Namjae Jeon: > >>> I've put these changes a long with rw credit fixes into my > >>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to > >>> test this? > >> Problems still occur. See: > > > > :-( Would you be able to use rxe and cake a network capture? > > > > Using test files with all zeros, e.g. > > dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 > > would allow gzip --best on the capture file to compress well... > > I think I found something that explains it and > I was able to reproduce and what I have in mind. > > We increment recv_io.posted.count after ib_post_recv() > > And manage_credits_prior_sending() uses > > new_credits = recv_io.posted.count - recv_io.credits.count > > But there is a race between the hardware receiving a message > and recv_done being called in order to decrement recv_io.posted.count > again. During that race manage_credits_prior_sending() might grant > too much credits. > > Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, > I haven't tested this branch yet, I'm running out of time > for the day. [ 3395.803163] ksmbd: running [ 3480.416969] perf: interrupt took too long (2547 > 2500), lowering kernel.perf_event_max_sample_rate to 78500 [ 3576.875490] ksmbd: smb_direct: dev[rocep1s0f0]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3576.875564] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3576.875599] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3576.875605] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3576.875612] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3576.876219] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3576.894371] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3576.894398] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3576.894429] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3576.894435] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3576.894442] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3576.894968] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3576.908669] ksmbd: smb_direct: dev[rocep1s0f0]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3576.908694] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3576.908727] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3576.908733] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3576.908740] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3576.909251] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3576.920882] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3576.920912] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3576.920961] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3576.920968] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3576.920974] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3576.921687] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3594.013976] ksmbd: smb_direct: disconnected [ 3594.013986] ksmbd: Failed to send message: -107 [ 3594.013992] ksmbd: sock_read failed: -107 [ 3594.014578] ksmbd: Failed to send message: -107 [ 3594.014616] ksmbd: Failed to send message: -107 [ 3594.014632] ksmbd: Failed to send message: -107 [ 3594.014803] ksmbd: Failed to send message: -107 [ 3594.014820] ksmbd: Failed to send message: -107 [ 3594.014833] ksmbd: Failed to send message: -107 [ 3594.014844] ksmbd: Failed to send message: -107 [ 3594.014855] ksmbd: Failed to send message: -107 [ 3594.014866] ksmbd: Failed to send message: -107 [ 3594.014877] ksmbd: Failed to send message: -107 [ 3594.016795] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 3594.019203] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3594.019235] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3594.019279] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3594.019287] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3594.019293] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3594.020638] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3619.448809] ksmbd: Failed to send message: -107 [ 3619.448825] ksmbd: Failed to send message: -107 [ 3619.448833] ksmbd: Failed to send message: -107 [ 3619.448840] ksmbd: Failed to send message: -107 [ 3619.448846] ksmbd: Failed to send message: -107 [ 3619.449697] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 3619.449762] ksmbd: Failed to send message: -107 [ 3619.449773] ksmbd: smb_direct: disconnected [ 3619.453543] ksmbd: smb_direct: dev[rocep1s0f1]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3619.453552] ksmbd: smb_direct: dev[rocep1s0f1]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3619.453566] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3619.453568] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3619.453571] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3619.453770] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3626.073742] ksmbd: smb_direct: dev[rocep1s0f0]: iwarp=0 ib=0 roce=1 v1=1 v2=1 device_cap_flags=0x1425321c36 kernel_cap_flags=0x2e page_size_cap=0xfffffffffffff000 [ 3626.073753] ksmbd: smb_direct: dev[rocep1s0f0]: max_qp_rd_atom=16 max_qp_init_rd_atom=16 max_fast_reg_page_list_len=65536 max_sgl_rd=3 max_sge_rd=30 max_cqe=4194303 max_qp_wr=8192 max_send_sge=30 max_recv_sge=32 [ 3626.073769] ksmbd: smb_direct: initiator_depth:16 peer_initiator_depth:16 [ 3626.073772] ksmbd: smb_direct: max_send_sges=4 max_read_write_size=8388608 maxpages=2048 [ 3626.073775] ksmbd: smb_direct: sc->rw_io.credits.num_pages=256 sc->rw_io.credits.max:9 [ 3626.073995] ksmbd: smb_direct: max_rdma_ctxs=9 rdma_send_wr=27 [ 3626.087056] ksmbd: Failed to send message: -107 [ 3626.087072] ksmbd: Failed to send message: -107 [ 3626.087097] ksmbd: smb_direct: disconnected [ 3626.087098] ksmbd: Failed to send message: -107 [ 3626.087104] ksmbd: sock_read failed: -107 [ 3626.087118] ksmbd: Failed to send message: -107 [ 3626.087439] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 3626.087475] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 3626.087485] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0 [ 3626.089052] ksmbd: Failed to send message: -107 [ 3626.090507] ksmbd: Failed to send message: -107 [ 3626.090552] ksmbd: Failed to send message: -107 [ 3626.090580] ksmbd: Failed to send message: -107 [ 3626.092896] ksmbd: Failed to send message: -107 [ 3626.092931] ksmbd: Failed to send message: -107 [ 3626.095299] ksmbd: Failed to send message: -107 > > But I tested it with smbclient and having a similar > logic in fs/smb/common/smbdirect/smbdirect_connection.c > > metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-15 20:17 ` Stefan Metzmacher 2025-12-16 23:59 ` Namjae Jeon @ 2026-01-14 18:13 ` Stefan Metzmacher 2026-01-15 2:01 ` Namjae Jeon 1 sibling, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-14 18:13 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: > Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: >> Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>>> I've put these changes a long with rw credit fixes into my >>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>>> test this? >>> Problems still occur. See: >> >> :-( Would you be able to use rxe and cake a network capture? >> >> Using test files with all zeros, e.g. >> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 >> would allow gzip --best on the capture file to compress well... > > I think I found something that explains it and > I was able to reproduce and what I have in mind. > > We increment recv_io.posted.count after ib_post_recv() > > And manage_credits_prior_sending() uses > > new_credits = recv_io.posted.count - recv_io.credits.count > > But there is a race between the hardware receiving a message > and recv_done being called in order to decrement recv_io.posted.count > again. During that race manage_credits_prior_sending() might grant > too much credits. > > Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, > I haven't tested this branch yet, I'm running out of time > for the day. > > But I tested it with smbclient and having a similar > logic in fs/smb/common/smbdirect/smbdirect_connection.c I was able to reproduce the problem and the fix I created for-6.18/ksmbd-smbdirect-regression-v5 was not correct. I needed to use available = atomic_xchg(&sc->recv_io.credits.available, 0); instead of available = atomic_read(&sc->recv_io.credits.available); atomic_sub(new_credits, &sc->recv_io.credits.available); This following branch works for me: for-6.18/ksmbd-smbdirect-regression-v7 and with the fixes again master this should also work: for-6.19/ksmbd-smbdirect-regression-v1 I'll post real patches tomorrow. Please check. Thanks! metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-14 18:13 ` Stefan Metzmacher @ 2026-01-15 2:01 ` Namjae Jeon 2026-01-15 9:50 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Namjae Jeon @ 2026-01-15 2:01 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: > > Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: > > Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: > >> Am 13.12.25 um 03:14 schrieb Namjae Jeon: > >>>> I've put these changes a long with rw credit fixes into my > >>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to > >>>> test this? > >>> Problems still occur. See: > >> > >> :-( Would you be able to use rxe and cake a network capture? > >> > >> Using test files with all zeros, e.g. > >> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 > >> would allow gzip --best on the capture file to compress well... > > > > I think I found something that explains it and > > I was able to reproduce and what I have in mind. > > > > We increment recv_io.posted.count after ib_post_recv() > > > > And manage_credits_prior_sending() uses > > > > new_credits = recv_io.posted.count - recv_io.credits.count > > > > But there is a race between the hardware receiving a message > > and recv_done being called in order to decrement recv_io.posted.count > > again. During that race manage_credits_prior_sending() might grant > > too much credits. > > > > Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, > > I haven't tested this branch yet, I'm running out of time > > for the day. > > > > But I tested it with smbclient and having a similar > > logic in fs/smb/common/smbdirect/smbdirect_connection.c > > I was able to reproduce the problem and the fix I created > for-6.18/ksmbd-smbdirect-regression-v5 was not correct. > > I needed to use > > available = atomic_xchg(&sc->recv_io.credits.available, 0); > > instead of > > available = atomic_read(&sc->recv_io.credits.available); > atomic_sub(new_credits, &sc->recv_io.credits.available); > > This following branch works for me: > for-6.18/ksmbd-smbdirect-regression-v7 > and with the fixes again master this should also work: > for-6.19/ksmbd-smbdirect-regression-v1 > > I'll post real patches tomorrow. > > Please check. Okay, I will test it with two branches. I'll try it too, but I recommend running frametest for performance difference and stress testing. https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest ex) frametest.exe -w 4k -t 20 -n 2000 Thanks. > > Thanks! > metze > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-15 2:01 ` Namjae Jeon @ 2026-01-15 9:50 ` Stefan Metzmacher 2026-01-16 23:08 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-15 9:50 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 15.01.26 um 03:01 schrieb Namjae Jeon: > On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: >> >> Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: >>> Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: >>>> Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>>>>> I've put these changes a long with rw credit fixes into my >>>>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>>>>> test this? >>>>> Problems still occur. See: >>>> >>>> :-( Would you be able to use rxe and cake a network capture? >>>> >>>> Using test files with all zeros, e.g. >>>> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 >>>> would allow gzip --best on the capture file to compress well... >>> >>> I think I found something that explains it and >>> I was able to reproduce and what I have in mind. >>> >>> We increment recv_io.posted.count after ib_post_recv() >>> >>> And manage_credits_prior_sending() uses >>> >>> new_credits = recv_io.posted.count - recv_io.credits.count >>> >>> But there is a race between the hardware receiving a message >>> and recv_done being called in order to decrement recv_io.posted.count >>> again. During that race manage_credits_prior_sending() might grant >>> too much credits. >>> >>> Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, >>> I haven't tested this branch yet, I'm running out of time >>> for the day. >>> >>> But I tested it with smbclient and having a similar >>> logic in fs/smb/common/smbdirect/smbdirect_connection.c >> >> I was able to reproduce the problem and the fix I created >> for-6.18/ksmbd-smbdirect-regression-v5 was not correct. >> >> I needed to use >> >> available = atomic_xchg(&sc->recv_io.credits.available, 0); >> >> instead of >> >> available = atomic_read(&sc->recv_io.credits.available); >> atomic_sub(new_credits, &sc->recv_io.credits.available); >> >> This following branch works for me: >> for-6.18/ksmbd-smbdirect-regression-v7 >> and with the fixes again master this should also work: >> for-6.19/ksmbd-smbdirect-regression-v1 >> >> I'll post real patches tomorrow. >> >> Please check. > Okay, I will test it with two branches. > I'll try it too, but I recommend running frametest for performance > difference and stress testing. > > https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest > > ex) frametest.exe -w 4k -t 20 -n 2000 That works fine, but frametest.exe -r 4k -t 20 -n 2000 generates a continues stream of such messages: ksmbd: Failed to send message: -107 Both with 6.17.2 and for-6.19/ksmbd-smbdirect-regression-v1, so this is not a regression. I'll now check if the is related to the other problems I found and fixes in for-6.18/ksmbd-smbdirect-regression-v5 metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-15 9:50 ` Stefan Metzmacher @ 2026-01-16 23:08 ` Stefan Metzmacher 2026-01-17 13:15 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-16 23:08 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 15.01.26 um 10:50 schrieb Stefan Metzmacher: > Am 15.01.26 um 03:01 schrieb Namjae Jeon: >> On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: >>> >>> Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: >>>> Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: >>>>> Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>>>>>> I've put these changes a long with rw credit fixes into my >>>>>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>>>>>> test this? >>>>>> Problems still occur. See: >>>>> >>>>> :-( Would you be able to use rxe and cake a network capture? >>>>> >>>>> Using test files with all zeros, e.g. >>>>> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 >>>>> would allow gzip --best on the capture file to compress well... >>>> >>>> I think I found something that explains it and >>>> I was able to reproduce and what I have in mind. >>>> >>>> We increment recv_io.posted.count after ib_post_recv() >>>> >>>> And manage_credits_prior_sending() uses >>>> >>>> new_credits = recv_io.posted.count - recv_io.credits.count >>>> >>>> But there is a race between the hardware receiving a message >>>> and recv_done being called in order to decrement recv_io.posted.count >>>> again. During that race manage_credits_prior_sending() might grant >>>> too much credits. >>>> >>>> Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, >>>> I haven't tested this branch yet, I'm running out of time >>>> for the day. >>>> >>>> But I tested it with smbclient and having a similar >>>> logic in fs/smb/common/smbdirect/smbdirect_connection.c >>> >>> I was able to reproduce the problem and the fix I created >>> for-6.18/ksmbd-smbdirect-regression-v5 was not correct. >>> >>> I needed to use >>> >>> available = atomic_xchg(&sc->recv_io.credits.available, 0); >>> >>> instead of >>> >>> available = atomic_read(&sc->recv_io.credits.available); >>> atomic_sub(new_credits, &sc->recv_io.credits.available); >>> >>> This following branch works for me: >>> for-6.18/ksmbd-smbdirect-regression-v7 >>> and with the fixes again master this should also work: >>> for-6.19/ksmbd-smbdirect-regression-v1 >>> >>> I'll post real patches tomorrow. >>> >>> Please check. >> Okay, I will test it with two branches. >> I'll try it too, but I recommend running frametest for performance >> difference and stress testing. >> >> https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest >> >> ex) frametest.exe -w 4k -t 20 -n 2000 > > That works fine, but > > frametest.exe -r 4k -t 20 -n 2000 > > generates a continues stream of such messages: > ksmbd: Failed to send message: -107 > > Both with 6.17.2 and for-6.19/ksmbd-smbdirect-regression-v1, > so this is not a regression. > > I'll now check if the is related to the other problems > I found and fixes in for-6.18/ksmbd-smbdirect-regression-v5 Ok, I found the problem. On send we are not allowed to consume the last send credit without granting any credit to the peer. MS-SMBD 3.1.5.1 Sending Upper Layer Messages ... If Connection.SendCredits is 1 and the CreditsGranted field of the message is 0, stop processing. ... MS-SMBD 3.1.5.9 Managing Credits Prior to Sending ... If Connection.ReceiveCredits is zero, or if Connection.SendCredits is one and the Connection.SendQueue is not empty, the sender MUST allocate and post at least one receive of size Connection.MaxReceiveSize and MUST increment Connection.ReceiveCredits by the number allocated and posted. If no receives are posted, the processing MUST return a value of zero to indicate to the caller that no Send message can be currently performed. ... It works in my master-ipproto-smbdirect branch, see the top commit. I'll backport the related logic to ksmbd on top of for-6.19/ksmbd-smbdirect-regression-v1 tomorrow. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-16 23:08 ` Stefan Metzmacher @ 2026-01-17 13:15 ` Stefan Metzmacher 2026-01-18 8:03 ` Namjae Jeon 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-17 13:15 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 17.01.26 um 00:08 schrieb Stefan Metzmacher: > Am 15.01.26 um 10:50 schrieb Stefan Metzmacher: >> Am 15.01.26 um 03:01 schrieb Namjae Jeon: >>> On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: >>>> >>>> Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: >>>>> Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: >>>>>> Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>>>>>>> I've put these changes a long with rw credit fixes into my >>>>>>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>>>>>>> test this? >>>>>>> Problems still occur. See: >>>>>> >>>>>> :-( Would you be able to use rxe and cake a network capture? >>>>>> >>>>>> Using test files with all zeros, e.g. >>>>>> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 >>>>>> would allow gzip --best on the capture file to compress well... >>>>> >>>>> I think I found something that explains it and >>>>> I was able to reproduce and what I have in mind. >>>>> >>>>> We increment recv_io.posted.count after ib_post_recv() >>>>> >>>>> And manage_credits_prior_sending() uses >>>>> >>>>> new_credits = recv_io.posted.count - recv_io.credits.count >>>>> >>>>> But there is a race between the hardware receiving a message >>>>> and recv_done being called in order to decrement recv_io.posted.count >>>>> again. During that race manage_credits_prior_sending() might grant >>>>> too much credits. >>>>> >>>>> Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, >>>>> I haven't tested this branch yet, I'm running out of time >>>>> for the day. >>>>> >>>>> But I tested it with smbclient and having a similar >>>>> logic in fs/smb/common/smbdirect/smbdirect_connection.c >>>> >>>> I was able to reproduce the problem and the fix I created >>>> for-6.18/ksmbd-smbdirect-regression-v5 was not correct. >>>> >>>> I needed to use >>>> >>>> available = atomic_xchg(&sc->recv_io.credits.available, 0); >>>> >>>> instead of >>>> >>>> available = atomic_read(&sc->recv_io.credits.available); >>>> atomic_sub(new_credits, &sc->recv_io.credits.available); >>>> >>>> This following branch works for me: >>>> for-6.18/ksmbd-smbdirect-regression-v7 >>>> and with the fixes again master this should also work: >>>> for-6.19/ksmbd-smbdirect-regression-v1 >>>> >>>> I'll post real patches tomorrow. >>>> >>>> Please check. >>> Okay, I will test it with two branches. >>> I'll try it too, but I recommend running frametest for performance >>> difference and stress testing. >>> >>> https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest >>> >>> ex) frametest.exe -w 4k -t 20 -n 2000 >> >> That works fine, but >> >> frametest.exe -r 4k -t 20 -n 2000 >> >> generates a continues stream of such messages: >> ksmbd: Failed to send message: -107 >> >> Both with 6.17.2 and for-6.19/ksmbd-smbdirect-regression-v1, >> so this is not a regression. >> >> I'll now check if the is related to the other problems >> I found and fixes in for-6.18/ksmbd-smbdirect-regression-v5 > > Ok, I found the problem. > > On send we are not allowed to consume the last send credit > without granting any credit to the peer. > > MS-SMBD 3.1.5.1 Sending Upper Layer Messages > > ... > If Connection.SendCredits is 1 and the CreditsGranted field of the message is 0, stop > processing. > ... > > MS-SMBD 3.1.5.9 Managing Credits Prior to Sending > > ... > If Connection.ReceiveCredits is zero, or if Connection.SendCredits is one and the > Connection.SendQueue is not empty, the sender MUST allocate and post at least one receive of size > Connection.MaxReceiveSize and MUST increment Connection.ReceiveCredits by the number > allocated and posted. If no receives are posted, the processing MUST return a value of zero to indicate > to the caller that no Send message can be currently performed. > ... > > It works in my master-ipproto-smbdirect branch, see the top commit. > > I'll backport the related logic to ksmbd on top of > for-6.19/ksmbd-smbdirect-regression-v1 tomorrow. for-6.19/ksmbd-smbdirect-regression-v2 has the fixes and works for me, I'll prepare official patches (most likely) on Monday. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-17 13:15 ` Stefan Metzmacher @ 2026-01-18 8:03 ` Namjae Jeon 2026-01-19 17:28 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Namjae Jeon @ 2026-01-18 8:03 UTC (permalink / raw) To: Stefan Metzmacher Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org On Sat, Jan 17, 2026 at 10:15 PM Stefan Metzmacher <metze@samba.org> wrote: > > Am 17.01.26 um 00:08 schrieb Stefan Metzmacher: > > Am 15.01.26 um 10:50 schrieb Stefan Metzmacher: > >> Am 15.01.26 um 03:01 schrieb Namjae Jeon: > >>> On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: > >>>> > >>>> Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: > >>>>> Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: > >>>>>> Am 13.12.25 um 03:14 schrieb Namjae Jeon: > >>>>>>>> I've put these changes a long with rw credit fixes into my > >>>>>>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to > >>>>>>>> test this? > >>>>>>> Problems still occur. See: > >>>>>> > >>>>>> :-( Would you be able to use rxe and cake a network capture? > >>>>>> > >>>>>> Using test files with all zeros, e.g. > >>>>>> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 > >>>>>> would allow gzip --best on the capture file to compress well... > >>>>> > >>>>> I think I found something that explains it and > >>>>> I was able to reproduce and what I have in mind. > >>>>> > >>>>> We increment recv_io.posted.count after ib_post_recv() > >>>>> > >>>>> And manage_credits_prior_sending() uses > >>>>> > >>>>> new_credits = recv_io.posted.count - recv_io.credits.count > >>>>> > >>>>> But there is a race between the hardware receiving a message > >>>>> and recv_done being called in order to decrement recv_io.posted.count > >>>>> again. During that race manage_credits_prior_sending() might grant > >>>>> too much credits. > >>>>> > >>>>> Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, > >>>>> I haven't tested this branch yet, I'm running out of time > >>>>> for the day. > >>>>> > >>>>> But I tested it with smbclient and having a similar > >>>>> logic in fs/smb/common/smbdirect/smbdirect_connection.c > >>>> > >>>> I was able to reproduce the problem and the fix I created > >>>> for-6.18/ksmbd-smbdirect-regression-v5 was not correct. > >>>> > >>>> I needed to use > >>>> > >>>> available = atomic_xchg(&sc->recv_io.credits.available, 0); > >>>> > >>>> instead of > >>>> > >>>> available = atomic_read(&sc->recv_io.credits.available); > >>>> atomic_sub(new_credits, &sc->recv_io.credits.available); > >>>> > >>>> This following branch works for me: > >>>> for-6.18/ksmbd-smbdirect-regression-v7 > >>>> and with the fixes again master this should also work: > >>>> for-6.19/ksmbd-smbdirect-regression-v1 > >>>> > >>>> I'll post real patches tomorrow. > >>>> > >>>> Please check. > >>> Okay, I will test it with two branches. > >>> I'll try it too, but I recommend running frametest for performance > >>> difference and stress testing. > >>> > >>> https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest > >>> > >>> ex) frametest.exe -w 4k -t 20 -n 2000 > >> > >> That works fine, but > >> > >> frametest.exe -r 4k -t 20 -n 2000 > >> > >> generates a continues stream of such messages: > >> ksmbd: Failed to send message: -107 > >> > >> Both with 6.17.2 and for-6.19/ksmbd-smbdirect-regression-v1, > >> so this is not a regression. > >> > >> I'll now check if the is related to the other problems > >> I found and fixes in for-6.18/ksmbd-smbdirect-regression-v5 > > > > Ok, I found the problem. > > > > On send we are not allowed to consume the last send credit > > without granting any credit to the peer. > > > > MS-SMBD 3.1.5.1 Sending Upper Layer Messages > > > > ... > > If Connection.SendCredits is 1 and the CreditsGranted field of the message is 0, stop > > processing. > > ... > > > > MS-SMBD 3.1.5.9 Managing Credits Prior to Sending > > > > ... > > If Connection.ReceiveCredits is zero, or if Connection.SendCredits is one and the > > Connection.SendQueue is not empty, the sender MUST allocate and post at least one receive of size > > Connection.MaxReceiveSize and MUST increment Connection.ReceiveCredits by the number > > allocated and posted. If no receives are posted, the processing MUST return a value of zero to indicate > > to the caller that no Send message can be currently performed. > > ... > > > > It works in my master-ipproto-smbdirect branch, see the top commit. > > > > I'll backport the related logic to ksmbd on top of > > for-6.19/ksmbd-smbdirect-regression-v1 tomorrow. > > for-6.19/ksmbd-smbdirect-regression-v2 has the fixes and works for > me, I'll prepare official patches (most likely) on Monday. I have tested the for-6.19/ksmbd-smbdirect-regression-v2 branch, and I can confirm that the issues I previously encountered in my test environment have been fixed. I have a couple of follow-up questions regarding this fix: 1. Regarding your frametest results, did you not observe any performance degradation or difference compared to linux-6.17.9? 2. You mentioned previously testing with Intel E810-CQDA2 NICs. Have you tested both iWARP and RoCEv2 modes on the E810? Thanks. > > metze > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-18 8:03 ` Namjae Jeon @ 2026-01-19 17:28 ` Stefan Metzmacher 2026-01-19 19:17 ` Stefan Metzmacher 0 siblings, 1 reply; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-19 17:28 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 18.01.26 um 09:03 schrieb Namjae Jeon: > On Sat, Jan 17, 2026 at 10:15 PM Stefan Metzmacher <metze@samba.org> wrote: >> >> Am 17.01.26 um 00:08 schrieb Stefan Metzmacher: >>> Am 15.01.26 um 10:50 schrieb Stefan Metzmacher: >>>> Am 15.01.26 um 03:01 schrieb Namjae Jeon: >>>>> On Thu, Jan 15, 2026 at 3:13 AM Stefan Metzmacher <metze@samba.org> wrote: >>>>>> >>>>>> Am 15.12.25 um 21:17 schrieb Stefan Metzmacher: >>>>>>> Am 14.12.25 um 23:56 schrieb Stefan Metzmacher: >>>>>>>> Am 13.12.25 um 03:14 schrieb Namjae Jeon: >>>>>>>>>> I've put these changes a long with rw credit fixes into my >>>>>>>>>> for-6.18/ksmbd-smbdirect-regression-v4 branch, are you able to >>>>>>>>>> test this? >>>>>>>>> Problems still occur. See: >>>>>>>> >>>>>>>> :-( Would you be able to use rxe and cake a network capture? >>>>>>>> >>>>>>>> Using test files with all zeros, e.g. >>>>>>>> dd if=/dev/zero of=/tmp/4096MBzeros-sparse.dat seek=4096MB bs=1 count=1 >>>>>>>> would allow gzip --best on the capture file to compress well... >>>>>>> >>>>>>> I think I found something that explains it and >>>>>>> I was able to reproduce and what I have in mind. >>>>>>> >>>>>>> We increment recv_io.posted.count after ib_post_recv() >>>>>>> >>>>>>> And manage_credits_prior_sending() uses >>>>>>> >>>>>>> new_credits = recv_io.posted.count - recv_io.credits.count >>>>>>> >>>>>>> But there is a race between the hardware receiving a message >>>>>>> and recv_done being called in order to decrement recv_io.posted.count >>>>>>> again. During that race manage_credits_prior_sending() might grant >>>>>>> too much credits. >>>>>>> >>>>>>> Please test my for-6.18/ksmbd-smbdirect-regression-v5 branch, >>>>>>> I haven't tested this branch yet, I'm running out of time >>>>>>> for the day. >>>>>>> >>>>>>> But I tested it with smbclient and having a similar >>>>>>> logic in fs/smb/common/smbdirect/smbdirect_connection.c >>>>>> >>>>>> I was able to reproduce the problem and the fix I created >>>>>> for-6.18/ksmbd-smbdirect-regression-v5 was not correct. >>>>>> >>>>>> I needed to use >>>>>> >>>>>> available = atomic_xchg(&sc->recv_io.credits.available, 0); >>>>>> >>>>>> instead of >>>>>> >>>>>> available = atomic_read(&sc->recv_io.credits.available); >>>>>> atomic_sub(new_credits, &sc->recv_io.credits.available); >>>>>> >>>>>> This following branch works for me: >>>>>> for-6.18/ksmbd-smbdirect-regression-v7 >>>>>> and with the fixes again master this should also work: >>>>>> for-6.19/ksmbd-smbdirect-regression-v1 >>>>>> >>>>>> I'll post real patches tomorrow. >>>>>> >>>>>> Please check. >>>>> Okay, I will test it with two branches. >>>>> I'll try it too, but I recommend running frametest for performance >>>>> difference and stress testing. >>>>> >>>>> https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest >>>>> >>>>> ex) frametest.exe -w 4k -t 20 -n 2000 >>>> >>>> That works fine, but >>>> >>>> frametest.exe -r 4k -t 20 -n 2000 >>>> >>>> generates a continues stream of such messages: >>>> ksmbd: Failed to send message: -107 >>>> >>>> Both with 6.17.2 and for-6.19/ksmbd-smbdirect-regression-v1, >>>> so this is not a regression. >>>> >>>> I'll now check if the is related to the other problems >>>> I found and fixes in for-6.18/ksmbd-smbdirect-regression-v5 >>> >>> Ok, I found the problem. >>> >>> On send we are not allowed to consume the last send credit >>> without granting any credit to the peer. >>> >>> MS-SMBD 3.1.5.1 Sending Upper Layer Messages >>> >>> ... >>> If Connection.SendCredits is 1 and the CreditsGranted field of the message is 0, stop >>> processing. >>> ... >>> >>> MS-SMBD 3.1.5.9 Managing Credits Prior to Sending >>> >>> ... >>> If Connection.ReceiveCredits is zero, or if Connection.SendCredits is one and the >>> Connection.SendQueue is not empty, the sender MUST allocate and post at least one receive of size >>> Connection.MaxReceiveSize and MUST increment Connection.ReceiveCredits by the number >>> allocated and posted. If no receives are posted, the processing MUST return a value of zero to indicate >>> to the caller that no Send message can be currently performed. >>> ... >>> >>> It works in my master-ipproto-smbdirect branch, see the top commit. >>> >>> I'll backport the related logic to ksmbd on top of >>> for-6.19/ksmbd-smbdirect-regression-v1 tomorrow. >> >> for-6.19/ksmbd-smbdirect-regression-v2 has the fixes and works for >> me, I'll prepare official patches (most likely) on Monday. > I have tested the for-6.19/ksmbd-smbdirect-regression-v2 branch, and I > can confirm that the issues I previously encountered in my test > environment have been fixed. Great! Thanks for testing! > I have a couple of follow-up questions regarding this fix: > 1. Regarding your frametest results, did you not observe any > performance degradation or difference compared to linux-6.17.9? Sorry, I don't understand what you are asking for. Do you mean with v6.19-rc5, for-6.19/ksmbd-smbdirect-regression-v1 or for-6.19/ksmbd-smbdirect-regression-v2? > 2. You mentioned previously testing with Intel E810-CQDA2 NICs. Have > you tested both iWARP and RoCEv2 modes on the E810? Yes, both while there seem to be strange problems with iWarp. I'll have to re-test with these cards, we'll test if it's possible to have both cards installed together both only getting 8 PCIe 5 lanes, that would make it easier to test. At the time I was always testing with KSAN, lockdep and other debugging features turned on, so performance was not as expected anyway... metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2026-01-19 17:28 ` Stefan Metzmacher @ 2026-01-19 19:17 ` Stefan Metzmacher 0 siblings, 0 replies; 24+ messages in thread From: Stefan Metzmacher @ 2026-01-19 19:17 UTC (permalink / raw) To: Namjae Jeon, Steve French Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org >>> for-6.19/ksmbd-smbdirect-regression-v2 has the fixes and works for >>> me, I'll prepare official patches (most likely) on Monday. >> I have tested the for-6.19/ksmbd-smbdirect-regression-v2 branch, and I >> can confirm that the issues I previously encountered in my test >> environment have been fixed. > > Great! Thanks for testing! I just realized that I need to fix the client side, I'll try to finish these tomorrow. I'll post all patches then as I also need to figure out the stable tags for all needed patches, so that 6.18 will get everything needed. metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-05 2:33 ` Namjae Jeon 2025-12-05 12:21 ` Namjae Jeon @ 2025-12-08 16:02 ` Stefan Metzmacher 1 sibling, 0 replies; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-08 16:02 UTC (permalink / raw) To: Namjae Jeon Cc: Tom Talpey, linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Am 05.12.25 um 03:33 schrieb Namjae Jeon: > On Thu, Dec 4, 2025 at 6:40 PM Stefan Metzmacher <metze@samba.org> wrote: >> >> Hi Namjae, > Hi Metze, >> >>> Okay, It seems like the issue has been improved in your v3 branch. If >>> you send the official patches, I will test it more. >> >> It's good to have verified that for-6.18/ksmbd-smbdirect-regression-v3 >> on a 6.18 kernel behaves the same as with 6.17.9, as transport_rdma.c >> is the same, but it doesn't really allow forward process on >> the Mellanox problem. >> >> Can you at least post the dmesg output generated by this: >> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=7e724ebc58e986f4e101a55f4ab5e96912239918 >> Assuming that this wasn't triggered: >> if (WARN_ONCE(needed > max_possible, "needed:%u > max:%u\n", needed, max_possible)) > I didn't know you wanted it. I will share it after office. >> >> Did you run the bpftrace command? Did it print a lot of >> 'smb_direct_rdma_xmit' message over the whole time of the file copy? > No, I didn't check it. but I will try this. >> >> Did you actually copied a file to or from the server? > nod. I asked what you were trying, not if it worked. >> >> Have you actually tested for-6.18/ksmbd-smbdirect-regression-v2, >> as requested? As I was in hope that it would work in the >> same way as for-6.18/ksmbd-smbdirect-regression-v3, >> but with only a single patch reverted. > I tested the v2 patch and the same issues still occurred, but they are > gone in v3. >> >> I'll continue to fix the general problem that this works >> for non Mellanox setups, as it seems it never worked at all :-( > Smbdirect should work well on Mellanox NICs. As I said before, most > people use this. I've rarely seen ksmbd users use smbdirect with > non-Mellanox NICs. If you want to have a stable, long-term smbdirect > feature on Samba, you'll need to have this device. Yes, I'll try to buy two connectx cards, but it will take some time until they will arrive. And from what I found out last week while fixing ksmbd to work with irdma in roce as well as rxe, to me was pure luck that it worked with Mellanox. >> >> Where you testing with RoCEv2 or Infiniband? > RoCEv2 >> >> I think moving forward for Mellanox setups requires these steps: >> - Test v1 vs. v2 and see that smb_direct_rdma_xmit is actually >> called at all. And see the dmesg output. >> - Testing with Mellanox RoCEv2 on the client and rxe on >> the server, so that we can create a network capture with tcpdump. Whould you be able to test with rxe on the ksmbd side? Is it possible to disable both infiniband and roce on the Mellanox nic? So that you can use rxe instead and take a capture? Thanks! metze ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Problem with smbdirect rw credits and initiator_depth 2025-12-03 18:18 Problem with smbdirect rw credits and initiator_depth Stefan Metzmacher 2025-12-04 0:07 ` Namjae Jeon @ 2025-12-04 9:57 ` Stefan Metzmacher 1 sibling, 0 replies; 24+ messages in thread From: Stefan Metzmacher @ 2025-12-04 9:57 UTC (permalink / raw) To: Namjae Jeon, Tom Talpey Cc: linux-cifs@vger.kernel.org, linux-rdma@vger.kernel.org Hi Tom, > I assume the solution is to change smb_direct_rdma_xmit, so that > it doesn't try to get credits for all RDMA read/write requests at once. > Instead after collecting all ib_send_wr structures from all rdma_rw_ctx_wrs() > we chunk the list to stay in the negotiated initiator depth, > before passing to ib_post_send(). > > At least we need to limit this for RDMA read requests, for RDMA write requests > we may not need to chunk and post them all together, but still chunking might > be good in order to avoid blocking concurrent RDMA sends. > > Tom is this assumption correct? I guess these manpages explain it as I expected: For the client: https://www.man7.org/linux/man-pages/man3/rdma_connect.3.html responder_resources The maximum number of outstanding RDMA read and atomic operations that the local side will accept from the remote side. Applies only to RDMA_PS_TCP. This value must be less than or equal to the local RDMA device attribute max_qp_rd_atom and remote RDMA device attribute max_qp_init_rd_atom. The remote endpoint can adjust this value when accepting the connection. For the server: https://www.man7.org/linux/man-pages/man3/rdma_accept.3.html initiator_depth The maximum number of outstanding RDMA read and atomic operations that the local side will have to the remote side. Applies only to RDMA_PS_TCP. This value must be less than or equal to the local RDMA device attribute max_qp_init_rd_atom and the initiator_depth value reported in the connect request event. I general I'm wondering why we set conn_param.retry_count = 6 (both client and server) retry_count The maximum number of times that a data transfer operation should be retried on the connection when an error occurs. This setting controls the number of times to retry send, RDMA, and atomic operations when timeouts occur. Applies only to RDMA_PS_TCP. I guess it initiator_depth/responder_resources values are respected by the server when doing RDMA reads, there should never be a reason to retry, correct? So we should use retry_count = 0, otherwise this may randomly mask problems. Do you remember what you used in Windows? Thanks! metze ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-01-19 19:17 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-03 18:18 Problem with smbdirect rw credits and initiator_depth Stefan Metzmacher 2025-12-04 0:07 ` Namjae Jeon 2025-12-04 9:39 ` Stefan Metzmacher 2025-12-05 2:33 ` Namjae Jeon 2025-12-05 12:21 ` Namjae Jeon 2025-12-08 16:13 ` Stefan Metzmacher 2025-12-10 16:42 ` Stefan Metzmacher 2025-12-11 19:38 ` Stefan Metzmacher 2025-12-12 9:58 ` Stefan Metzmacher 2025-12-12 15:35 ` Stefan Metzmacher 2025-12-13 2:14 ` Namjae Jeon 2025-12-14 22:56 ` Stefan Metzmacher 2025-12-15 20:17 ` Stefan Metzmacher 2025-12-16 23:59 ` Namjae Jeon 2026-01-14 18:13 ` Stefan Metzmacher 2026-01-15 2:01 ` Namjae Jeon 2026-01-15 9:50 ` Stefan Metzmacher 2026-01-16 23:08 ` Stefan Metzmacher 2026-01-17 13:15 ` Stefan Metzmacher 2026-01-18 8:03 ` Namjae Jeon 2026-01-19 17:28 ` Stefan Metzmacher 2026-01-19 19:17 ` Stefan Metzmacher 2025-12-08 16:02 ` Stefan Metzmacher 2025-12-04 9:57 ` Stefan Metzmacher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox