Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Anna Schumaker <anna@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v2 0/3] xprtrdma: Decouple req recycling from RPC completion
Date: Fri, 22 May 2026 20:02:49 -0400	[thread overview]
Message-ID: <20260523000252.465074-1-cel@kernel.org> (raw)

From: Chuck Lever <chuck.lever@oracle.com>

rl_kref currently conflates two lifetimes through one refcount:
it gates when a Reply can wake its RPC task, and it gates when
an rpcrdma_req can return to its free pool. The marshal path
takes the Send-side reference only when sc_unmap_count > 0, so
a Send carrying only pre-registered buffers takes no Send-side
reference. When the Reply for such an RPC arrives before its
Send completes, the Reply handler drops rl_kref from 1 to 0
and frees the req while the HCA may still be DMA-reading from
its send buffer. A retransmission can put different bytes on
the wire.

This series narrows rl_kref's job. The RPC layer takes one
reference at slot allocation; rpcrdma_prepare_send_sges() takes
a Send-side reference unconditionally after WR prep succeeds.
A req returns to its free pool only after both owners release.
Replies complete the RPC directly, without atomic activity on
rl_kref.

Three design choices shape the series.

The Send-side reference is taken only on the success path of
rpcrdma_prepare_send_sges(). Marshal failure runs
rpcrdma_sendctx_cancel(), which unmaps the SGEs and clears
sc_req without touching rl_kref. Sendctx ring walks in
rpcrdma_sendctx_put_locked() and rpcrdma_sendctxs_destroy()
skip entries whose sc_req is NULL, so a burst of -EIO marshal
failures cannot hold reqs off rb_send_bufs.

Connection teardown drains the sendctx ring against pre-reset
reqs by ordering rpcrdma_sendctxs_destroy() ahead of
rpcrdma_reqs_reset() in rpcrdma_xprt_disconnect(). The drain
releases Send-side references whose unsignaled Sends never had
a later signaled completion to walk the ring. On the
backchannel, releasing a bc_prealloc req re-adds it to
bc_pa_list, which xprt_destroy_backchannel() has already
emptied; xprt_rdma_destroy() runs xprt_rdma_bc_destroy() a
second time after the disconnect to reclaim those reqs.

With recycling now gated on Send completion, completed RPCs
can remain pinned by the sendctx ring until the next signaled
Send completion. The headroom is bounded: re_send_batch is set
to re_max_requests >> 3. The req pool gains max_reqs/8 slack
(patch 3) so the recycle delay does not stall a slot
allocation that the RPC/RDMA credit window would admit.

Changes since v1:
- Split into three patches. A prep patch converts the
  Send-signaling test from a kref_read to sc_unmap_count, and
  a separate patch names the request-pool slack at its
  allocation site.
- Wrap the bc_prealloc release branch in
  CONFIG_SUNRPC_BACKCHANNEL (kernel test robot, build break on
  configs without the backchannel).
- Order rpcrdma_sendctxs_destroy() ahead of
  rpcrdma_reqs_reset() in rpcrdma_xprt_disconnect() so the
  drain runs against pre-reset reqs.
- Run xprt_rdma_bc_destroy() a second time from
  xprt_rdma_destroy() to reclaim bc_prealloc reqs returned by
  the disconnect's drain.
- Add rpcrdma_sendctx_cancel() for the marshal-failure path;
  sendctx ring walkers skip entries with sc_req == NULL.

Link to v1: https://lore.kernel.org/r/20260520175016.29480-1-cel@kernel.org

Chuck Lever (3):
  xprtrdma: Use sendctx DMA state for Send signaling
  xprtrdma: Decouple req recycling from RPC completion
  xprtrdma: Add request-pool slack for delayed recycling

 net/sunrpc/xprtrdma/backchannel.c |  5 +--
 net/sunrpc/xprtrdma/frwr_ops.c    |  2 +-
 net/sunrpc/xprtrdma/rpc_rdma.c    | 63 ++++++++++++++-----------------
 net/sunrpc/xprtrdma/transport.c   | 55 ++++++++++++++++++++++++---
 net/sunrpc/xprtrdma/verbs.c       | 46 +++++++++++++++++++---
 net/sunrpc/xprtrdma/xprt_rdma.h   |  2 +-
 6 files changed, 124 insertions(+), 49 deletions(-)

--
2.54.0


             reply	other threads:[~2026-05-23  0:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-23  0:02 Chuck Lever [this message]
2026-05-23  0:02 ` [PATCH v2 1/3] xprtrdma: Use sendctx DMA state for Send signaling Chuck Lever
2026-05-23  0:02 ` [PATCH v2 2/3] xprtrdma: Decouple req recycling from RPC completion Chuck Lever
2026-05-23  0:02 ` [PATCH v2 3/3] xprtrdma: Add request-pool slack for delayed recycling Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260523000252.465074-1-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox