Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>,
	Jeff Layton <jlayton@kernel.org>,  NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	 Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
	 Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH 0/2] svcrdma: Reduce svcrdma_wq contention on the Send completion path
Date: Wed, 06 May 2026 11:26:49 -0400	[thread overview]
Message-ID: <20260506-svcrdma-next-v1-0-915fce8c4fbb@oracle.com> (raw)

Profiling an 8KB NFSv3 read/write workload over RDMA shows about
4% of total CPU spent on the svcrdma_wq unbound workqueue pool
spinlock. Each Send completion queues work on svcrdma_wq to
release the send_ctxt, and that work item queues another item
for each write_info chunk it owns. Every queue_work step contends
on the same pool lock.

The first patch removes the inner re-queue.
svc_rdma_write_info_free already runs on svcrdma_wq from its
caller, so the extra work item only adds another spinlock
acquisition with no parallelism to gain. Inlining the chunk
release recovers roughly 1% of CPU cycles. Mike, your workload
might see relief from just this patch alone.

The second patch retires svcrdma_wq. Send completion handlers
append the send_ctxt to a per-transport lock-free list, and the
nfsd thread drains the list in xpo_release_ctxt between RPCs.
DMA unmap and page release move out of the completion context.
That matters when an IOMMU runs in strict mode, where each unmap
synchronously invalidates the IOTLB; the nfsd thread absorbs that
latency where it is harmless and batches teardown across all
completions that accumulated during the prior RPC.

A self-enqueue covers the trailing edge of a burst. When a Send
completion finds sc_send_release_list previously empty on an idle
connection, it sets XPT_DATA and enqueues the transport. The nfsd
thread enters svc_rdma_recvfrom, finds nothing to receive, and
returns; svc_xprt_release then runs xpo_release_ctxt and drains
the list. Without that wakeup, a Send completion arriving after
the last xpo_release_ctxt would leave the send_ctxt's DMA mappings
and reply pages pinned until the next RPC, send-context exhaustion,
or transport close.

Patches were rebased today, but have not been recently tested.

---
Chuck Lever (2):
      svcrdma: Release write chunk resources without re-queuing
      svcrdma: Defer send context release to xpo_release_ctxt

 include/linux/sunrpc/svc_rdma.h          |  6 +--
 net/sunrpc/xprtrdma/svc_rdma.c           | 18 +------
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  9 ++++
 net/sunrpc/xprtrdma/svc_rdma_rw.c        | 13 +----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 91 +++++++++++++++++++++++---------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  3 +-
 6 files changed, 84 insertions(+), 56 deletions(-)
---
base-commit: d1c29a34fe35c1eb9331cab0537c7bb583692187
change-id: 20260506-svcrdma-next-2e736249390f

Best regards,
--  
Chuck Lever <chuck.lever@oracle.com>


             reply	other threads:[~2026-05-06 15:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 15:26 Chuck Lever [this message]
2026-05-06 15:26 ` [PATCH 1/2] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-05-07 20:46   ` Mike Snitzer
2026-05-08 20:14     ` Chuck Lever
2026-05-06 15:26 ` [PATCH 2/2] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260506-svcrdma-next-v1-0-915fce8c4fbb@oracle.com \
    --to=cel@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox