Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH 0/2] svcrdma: Reduce svcrdma_wq contention on the Send completion path
@ 2026-05-06 15:26 Chuck Lever
  2026-05-06 15:26 ` [PATCH 1/2] svcrdma: Release write chunk resources without re-queuing Chuck Lever
  2026-05-06 15:26 ` [PATCH 2/2] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever
  0 siblings, 2 replies; 5+ messages in thread
From: Chuck Lever @ 2026-05-06 15:26 UTC (permalink / raw)
  To: Mike Snitzer, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey
  Cc: linux-nfs, linux-rdma, Chuck Lever

Profiling an 8KB NFSv3 read/write workload over RDMA shows about
4% of total CPU spent on the svcrdma_wq unbound workqueue pool
spinlock. Each Send completion queues work on svcrdma_wq to
release the send_ctxt, and that work item queues another item
for each write_info chunk it owns. Every queue_work step contends
on the same pool lock.

The first patch removes the inner re-queue.
svc_rdma_write_info_free already runs on svcrdma_wq from its
caller, so the extra work item only adds another spinlock
acquisition with no parallelism to gain. Inlining the chunk
release recovers roughly 1% of CPU cycles. Mike, your workload
might see relief from just this patch alone.

The second patch retires svcrdma_wq. Send completion handlers
append the send_ctxt to a per-transport lock-free list, and the
nfsd thread drains the list in xpo_release_ctxt between RPCs.
DMA unmap and page release move out of the completion context.
That matters when an IOMMU runs in strict mode, where each unmap
synchronously invalidates the IOTLB; the nfsd thread absorbs that
latency where it is harmless and batches teardown across all
completions that accumulated during the prior RPC.

A self-enqueue covers the trailing edge of a burst. When a Send
completion finds sc_send_release_list previously empty on an idle
connection, it sets XPT_DATA and enqueues the transport. The nfsd
thread enters svc_rdma_recvfrom, finds nothing to receive, and
returns; svc_xprt_release then runs xpo_release_ctxt and drains
the list. Without that wakeup, a Send completion arriving after
the last xpo_release_ctxt would leave the send_ctxt's DMA mappings
and reply pages pinned until the next RPC, send-context exhaustion,
or transport close.

Patches were rebased today, but have not been recently tested.

---
Chuck Lever (2):
      svcrdma: Release write chunk resources without re-queuing
      svcrdma: Defer send context release to xpo_release_ctxt

 include/linux/sunrpc/svc_rdma.h          |  6 +--
 net/sunrpc/xprtrdma/svc_rdma.c           | 18 +------
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  9 ++++
 net/sunrpc/xprtrdma/svc_rdma_rw.c        | 13 +----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 91 +++++++++++++++++++++++---------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  3 +-
 6 files changed, 84 insertions(+), 56 deletions(-)
---
base-commit: d1c29a34fe35c1eb9331cab0537c7bb583692187
change-id: 20260506-svcrdma-next-2e736249390f

Best regards,
--  
Chuck Lever <chuck.lever@oracle.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-08 20:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 15:26 [PATCH 0/2] svcrdma: Reduce svcrdma_wq contention on the Send completion path Chuck Lever
2026-05-06 15:26 ` [PATCH 1/2] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-05-07 20:46   ` Mike Snitzer
2026-05-08 20:14     ` Chuck Lever
2026-05-06 15:26 ` [PATCH 2/2] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox