From: Chuck Lever <cel@kernel.org>
To: Anna Schumaker <anna@kernel.org>
Cc: <linux-rdma@vger.kernel.org>, <linux-nfs@vger.kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v3 5/5] xprtrdma: Document and assert reply-handler invariants
Date: Tue, 26 May 2026 10:14:05 -0400 [thread overview]
Message-ID: <20260526141405.39877-6-cel@kernel.org> (raw)
In-Reply-To: <20260526141405.39877-1-cel@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
The xprtrdma reply path has been the subject of recurring
LLM-driven review claims that 'an RPC can complete while
receive buffers are still DMA-mapped' or that 'the req can be
freed while the HCA still owns the send buffer.' No runtime
reproducer has surfaced, but the absence of a written-down
invariant set lets each pass of automated review reach the
same hypothetical conclusion. Subsequent fixes against
ce2f9a4d9ccc ('xprtrdma: Decouple req recycling from RPC
completion') closed the underlying races but did not document
the closure where future readers will look for it.
State the invariants explicitly in a comment above
rpcrdma_reply_handler() and back four of them with
WARN_ON_ONCE() probes positioned where each invariant is
locally checkable on the previous patch's cleaned-up
ownership state:
- I1 (Receive WR ownership): WARN at rpcrdma_post_recvs() that
a rep pulled from rb_free_reps carries rr_rqst == NULL.
- I2 (rep attachment): WARN at rpcrdma_reply_put() that
req->rl_reply was NULLed before the matching rep_put.
- I3 (Registered-MR fence): WARN at rpcrdma_complete_rqst()
that req->rl_registered is empty. Strong send-queue
ordering of the LocalInv WR chain makes the last
completion observe the ib_dma_unmap_sg() of every earlier
MR, so 'list empty' implies 'all MRs unmapped'.
- I4 (Send-buffer release): WARN at rpcrdma_req_release()
that req->rl_sendctx is NULL. Reaching the kref release
callback requires both the RPC-layer and Send-side
references to have dropped; the Send-side drop runs in
rpcrdma_sendctx_unmap(), which clears rl_sendctx
(previous patch). A non-NULL rl_sendctx here would mean
the Send-side owner had not run -- a contradiction.
The XXX comment in xprt_rdma_free() about signal-driven
release racing the Send completion described the pre-decouple
state. Replace it with a one-line note pointing at the
invariant set, since the kref scheme now holds the req across
the in-flight Send regardless of which path released the
rpc_task.
I5 (req lifecycle) is stated in the comment but not probed:
making it locally assertible would require moving kref_init
out of rpcrdma_req_release(), which in turn requires adding
kref_init to the bc_pa_list and backlog-wake reuse paths.
That restructuring is deferred -- the invariant is unchanged
either way.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 69 +++++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/transport.c | 13 +++++--
net/sunrpc/xprtrdma/verbs.c | 6 +++
3 files changed, 84 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index f4b4abefc4e0..626cadec4555 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1336,6 +1336,11 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
struct rpc_rqst *rqst = rep->rr_rqst;
int status;
+ /* I3: every registered MR has been invalidated and
+ * ib_dma_unmap_sg()'d before complete_rqst runs.
+ */
+ WARN_ON_ONCE(!list_empty(&rpcr_to_rdmar(rqst)->rl_registered));
+
switch (rep->rr_proc) {
case rdma_msg:
status = rpcrdma_decode_msg(r_xprt, rep, rqst);
@@ -1367,6 +1372,70 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
goto out;
}
+/* Reply-side ownership invariants
+ *
+ * I1 (Receive WR ownership). A struct rpcrdma_rep is owned by the
+ * HCA between ib_post_recv() and the matching Receive completion.
+ * After ib_dma_sync_single_for_cpu() in rpcrdma_wc_receive() it is
+ * owned by the CPU until rpcrdma_rep_put() returns it to
+ * rb_free_reps; a rep on rb_free_reps is not re-posted until
+ * rpcrdma_post_recvs() pulls it off. Asserted: rpcrdma_post_recvs()
+ * WARNs that a pulled rep has rr_rqst == NULL.
+ *
+ * I2 (rep attachment). While req->rl_reply == rep, the rep cannot be
+ * re-posted. rpcrdma_reply_put() NULLs req->rl_reply before handing
+ * the rep to rpcrdma_rep_put(). Asserted: rpcrdma_reply_put() WARNs
+ * that rl_reply is NULL after the put.
+ *
+ * I3 (Registered-MR fence). On entry to rpcrdma_complete_rqst() every
+ * MR that was on req->rl_registered has had its rkey invalidated
+ * (remotely via IB_WC_WITH_INVALIDATE or locally via IB_WR_LOCAL_INV)
+ * and its pages ib_dma_unmap_sg()'d. The LocalInv chain is posted
+ * on a single QP; strong send-queue ordering makes the last
+ * completion (frwr_wc_localinv_done) observe the
+ * ib_dma_unmap_sg() that ran from each earlier completion's
+ * frwr_mr_put() before complete_rqst is called. The inline
+ * frwr_reminv() path unmaps its one MR synchronously before
+ * rpcrdma_reply_handler() reaches complete_rqst. Asserted:
+ * rpcrdma_complete_rqst() WARNs that rl_registered is empty.
+ *
+ * I4 (Send-buffer release). req->rl_kref carries two unconditional
+ * owners while a Send is outstanding: the RPC-layer reference (set
+ * at xprt_rdma_alloc_slot / xprt_rdma_bc_rqst_get / rpcrdma_req_release
+ * pool-entry) and the Send-side reference (kref_get() in
+ * rpcrdma_prepare_send_sges()). rpcrdma_req_release() runs only
+ * after both have dropped, so the req does not return to its free
+ * pool until rpcrdma_sendctx_unmap() has fired -- the HCA has
+ * released the send buffer before the req can be reused. Asserted:
+ * rpcrdma_req_release() WARNs that rl_sendctx is NULL.
+ *
+ * I5 (req lifecycle). A req is owned by the RPC layer between slot
+ * acquisition and the matching xprt_rdma_free_slot() (or, for the
+ * backchannel, xprt_rdma_bc_free_rqst()). While owned, rl_kref >= 1.
+ * The pools (rb_send_bufs, bc_pa_list, backlog wake target) never
+ * contain a req with outstanding Send-side or Reply-side work.
+ *
+ * Non-hazards. The following claims have been raised by adversarial
+ * review and are each closed by the invariants above:
+ *
+ * * "Reply completes the RPC while the HCA still holds the send
+ * buffer" -- excluded by I4. The Send-side kref reference is held
+ * until rpcrdma_sendctx_unmap() runs from Send completion.
+ *
+ * * "Signal-driven release races the in-flight Send" -- same
+ * resolution. xprt_rdma_free() does not touch rl_kref; the
+ * Send-side reference keeps the req out of its pool until Send
+ * completion fires.
+ *
+ * * "Receive completion races rep reuse" -- excluded by I1. A rep
+ * is on rb_free_reps only after rpcrdma_rep_put() has been called
+ * and rpcrdma_post_recvs() owns the next transition back to the HCA.
+ *
+ * * "Pages still DMA-mapped when call_decode reads them" -- excluded
+ * by I3. The matching ib_dma_unmap_sg() for every MR has run on
+ * the same CPU thread that calls rpcrdma_complete_rqst().
+ */
+
/**
* rpcrdma_reply_handler - Process received RPC/RDMA messages
* @rep: Incoming rpcrdma_rep object to process
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 5569f17fdd9b..5ff8e5126a6c 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -509,6 +509,11 @@ static void rpcrdma_req_release(struct kref *kref)
struct rpc_xprt *xprt = rqst->rq_xprt;
struct rpcrdma_xprt *r_xprt;
+ /* I4: both the RPC-layer and Send-side owners have dropped,
+ * so rpcrdma_sendctx_unmap() has cleared rl_sendctx.
+ */
+ WARN_ON_ONCE(req->rl_sendctx);
+
kref_init(&req->rl_kref);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
@@ -652,10 +657,10 @@ xprt_rdma_free(struct rpc_task *task)
frwr_unmap_sync(rpcx_to_rdmax(rqst->rq_xprt), req);
}
- /* XXX: If the RPC is completing because of a signal and
- * not because a reply was received, we ought to ensure
- * that the Send completion has fired, so that memory
- * involved with the Send is not still visible to the NIC.
+ /* The Send-side rl_kref owner keeps req out of its free pool
+ * until rpcrdma_sendctx_unmap() has fired -- see I4 above
+ * rpcrdma_reply_handler() -- so signal-driven release here
+ * does not let the HCA touch a recycled send buffer.
*/
}
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 60cbc14c5299..da2c6fa44154 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1259,6 +1259,10 @@ void rpcrdma_reply_put(struct rpcrdma_buffer *buffers, struct rpcrdma_req *req)
req->rl_reply = NULL;
rpcrdma_rep_put(buffers, rep);
}
+ /* I2: rl_reply NULL after the put closes the
+ * 'rep on rb_free_reps still referenced by req' window.
+ */
+ WARN_ON_ONCE(req->rl_reply);
}
/**
@@ -1435,6 +1439,8 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
rep = rpcrdma_rep_create(r_xprt);
if (!rep)
break;
+ /* I1: a rep on rb_free_reps must carry no rqst pointer. */
+ WARN_ON_ONCE(rep->rr_rqst);
if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
rpcrdma_rep_put(buf, rep);
break;
--
2.54.0
prev parent reply other threads:[~2026-05-26 14:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 14:14 [PATCH v3 0/5] xprtrdma: Decouple req recycling from RPC completion Chuck Lever
2026-05-26 14:14 ` [PATCH v3 1/5] xprtrdma: Use sendctx DMA state for Send signaling Chuck Lever
2026-05-26 14:14 ` [PATCH v3 2/5] xprtrdma: Decouple req recycling from RPC completion Chuck Lever
2026-05-26 14:14 ` [PATCH v3 3/5] xprtrdma: Add request-pool slack for delayed recycling Chuck Lever
2026-05-26 14:14 ` [PATCH v3 4/5] xprtrdma: Clear receive-side ownership pointers on release Chuck Lever
2026-05-26 14:14 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260526141405.39877-6-cel@kernel.org \
--to=cel@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.