From: Chuck Lever <cel@kernel.org>
To: Anna Schumaker <anna@kernel.org>
Cc: <linux-rdma@vger.kernel.org>, <linux-nfs@vger.kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v3 5/5] xprtrdma: Document and assert reply-handler invariants
Date: Tue, 26 May 2026 10:14:05 -0400 [thread overview]
Message-ID: <20260526141405.39877-6-cel@kernel.org> (raw)
In-Reply-To: <20260526141405.39877-1-cel@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
The xprtrdma reply path has been the subject of recurring
LLM-driven review claims that 'an RPC can complete while
receive buffers are still DMA-mapped' or that 'the req can be
freed while the HCA still owns the send buffer.' No runtime
reproducer has surfaced, but the absence of a written-down
invariant set lets each pass of automated review reach the
same hypothetical conclusion. Subsequent fixes against
ce2f9a4d9ccc ('xprtrdma: Decouple req recycling from RPC
completion') closed the underlying races but did not document
the closure where future readers will look for it.
State the invariants explicitly in a comment above
rpcrdma_reply_handler() and back four of them with
WARN_ON_ONCE() probes positioned where each invariant is
locally checkable on the previous patch's cleaned-up
ownership state:
- I1 (Receive WR ownership): WARN at rpcrdma_post_recvs() that
a rep pulled from rb_free_reps carries rr_rqst == NULL.
- I2 (rep attachment): WARN at rpcrdma_reply_put() that
req->rl_reply was NULLed before the matching rep_put.
- I3 (Registered-MR fence): WARN at rpcrdma_complete_rqst()
that req->rl_registered is empty. Strong send-queue
ordering of the LocalInv WR chain makes the last
completion observe the ib_dma_unmap_sg() of every earlier
MR, so 'list empty' implies 'all MRs unmapped'.
- I4 (Send-buffer release): WARN at rpcrdma_req_release()
that req->rl_sendctx is NULL. Reaching the kref release
callback requires both the RPC-layer and Send-side
references to have dropped; the Send-side drop runs in
rpcrdma_sendctx_unmap(), which clears rl_sendctx
(previous patch). A non-NULL rl_sendctx here would mean
the Send-side owner had not run -- a contradiction.
The XXX comment in xprt_rdma_free() about signal-driven
release racing the Send completion described the pre-decouple
state. Replace it with a one-line note pointing at the
invariant set, since the kref scheme now holds the req across
the in-flight Send regardless of which path released the
rpc_task.
I5 (req lifecycle) is stated in the comment but not probed:
making it locally assertible would require moving kref_init
out of rpcrdma_req_release(), which in turn requires adding
kref_init to the bc_pa_list and backlog-wake reuse paths.
That restructuring is deferred -- the invariant is unchanged
either way.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 69 +++++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/transport.c | 13 +++++--
net/sunrpc/xprtrdma/verbs.c | 6 +++
3 files changed, 84 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index f4b4abefc4e0..626cadec4555 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1336,6 +1336,11 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
struct rpc_rqst *rqst = rep->rr_rqst;
int status;
+ /* I3: every registered MR has been invalidated and
+ * ib_dma_unmap_sg()'d before complete_rqst runs.
+ */
+ WARN_ON_ONCE(!list_empty(&rpcr_to_rdmar(rqst)->rl_registered));
+
switch (rep->rr_proc) {
case rdma_msg:
status = rpcrdma_decode_msg(r_xprt, rep, rqst);
@@ -1367,6 +1372,70 @@ void rpcrdma_complete_rqst(struct rpcrdma_rep *rep)
goto out;
}
+/* Reply-side ownership invariants
+ *
+ * I1 (Receive WR ownership). A struct rpcrdma_rep is owned by the
+ * HCA between ib_post_recv() and the matching Receive completion.
+ * After ib_dma_sync_single_for_cpu() in rpcrdma_wc_receive() it is
+ * owned by the CPU until rpcrdma_rep_put() returns it to
+ * rb_free_reps; a rep on rb_free_reps is not re-posted until
+ * rpcrdma_post_recvs() pulls it off. Asserted: rpcrdma_post_recvs()
+ * WARNs that a pulled rep has rr_rqst == NULL.
+ *
+ * I2 (rep attachment). While req->rl_reply == rep, the rep cannot be
+ * re-posted. rpcrdma_reply_put() NULLs req->rl_reply before handing
+ * the rep to rpcrdma_rep_put(). Asserted: rpcrdma_reply_put() WARNs
+ * that rl_reply is NULL after the put.
+ *
+ * I3 (Registered-MR fence). On entry to rpcrdma_complete_rqst() every
+ * MR that was on req->rl_registered has had its rkey invalidated
+ * (remotely via IB_WC_WITH_INVALIDATE or locally via IB_WR_LOCAL_INV)
+ * and its pages ib_dma_unmap_sg()'d. The LocalInv chain is posted
+ * on a single QP; strong send-queue ordering makes the last
+ * completion (frwr_wc_localinv_done) observe the
+ * ib_dma_unmap_sg() that ran from each earlier completion's
+ * frwr_mr_put() before complete_rqst is called. The inline
+ * frwr_reminv() path unmaps its one MR synchronously before
+ * rpcrdma_reply_handler() reaches complete_rqst. Asserted:
+ * rpcrdma_complete_rqst() WARNs that rl_registered is empty.
+ *
+ * I4 (Send-buffer release). req->rl_kref carries two unconditional
+ * owners while a Send is outstanding: the RPC-layer reference (set
+ * at xprt_rdma_alloc_slot / xprt_rdma_bc_rqst_get / rpcrdma_req_release
+ * pool-entry) and the Send-side reference (kref_get() in
+ * rpcrdma_prepare_send_sges()). rpcrdma_req_release() runs only
+ * after both have dropped, so the req does not return to its free
+ * pool until rpcrdma_sendctx_unmap() has fired -- the HCA has
+ * released the send buffer before the req can be reused. Asserted:
+ * rpcrdma_req_release() WARNs that rl_sendctx is NULL.
+ *
+ * I5 (req lifecycle). A req is owned by the RPC layer between slot
+ * acquisition and the matching xprt_rdma_free_slot() (or, for the
+ * backchannel, xprt_rdma_bc_free_rqst()). While owned, rl_kref >= 1.
+ * The pools (rb_send_bufs, bc_pa_list, backlog wake target) never
+ * contain a req with outstanding Send-side or Reply-side work.
+ *
+ * Non-hazards. The following claims have been raised by adversarial
+ * review and are each closed by the invariants above:
+ *
+ * * "Reply completes the RPC while the HCA still holds the send
+ * buffer" -- excluded by I4. The Send-side kref reference is held
+ * until rpcrdma_sendctx_unmap() runs from Send completion.
+ *
+ * * "Signal-driven release races the in-flight Send" -- same
+ * resolution. xprt_rdma_free() does not touch rl_kref; the
+ * Send-side reference keeps the req out of its pool until Send
+ * completion fires.
+ *
+ * * "Receive completion races rep reuse" -- excluded by I1. A rep
+ * is on rb_free_reps only after rpcrdma_rep_put() has been called
+ * and rpcrdma_post_recvs() owns the next transition back to the HCA.
+ *
+ * * "Pages still DMA-mapped when call_decode reads them" -- excluded
+ * by I3. The matching ib_dma_unmap_sg() for every MR has run on
+ * the same CPU thread that calls rpcrdma_complete_rqst().
+ */
+
/**
* rpcrdma_reply_handler - Process received RPC/RDMA messages
* @rep: Incoming rpcrdma_rep object to process
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 5569f17fdd9b..5ff8e5126a6c 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -509,6 +509,11 @@ static void rpcrdma_req_release(struct kref *kref)
struct rpc_xprt *xprt = rqst->rq_xprt;
struct rpcrdma_xprt *r_xprt;
+ /* I4: both the RPC-layer and Send-side owners have dropped,
+ * so rpcrdma_sendctx_unmap() has cleared rl_sendctx.
+ */
+ WARN_ON_ONCE(req->rl_sendctx);
+
kref_init(&req->rl_kref);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
@@ -652,10 +657,10 @@ xprt_rdma_free(struct rpc_task *task)
frwr_unmap_sync(rpcx_to_rdmax(rqst->rq_xprt), req);
}
- /* XXX: If the RPC is completing because of a signal and
- * not because a reply was received, we ought to ensure
- * that the Send completion has fired, so that memory
- * involved with the Send is not still visible to the NIC.
+ /* The Send-side rl_kref owner keeps req out of its free pool
+ * until rpcrdma_sendctx_unmap() has fired -- see I4 above
+ * rpcrdma_reply_handler() -- so signal-driven release here
+ * does not let the HCA touch a recycled send buffer.
*/
}
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 60cbc14c5299..da2c6fa44154 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1259,6 +1259,10 @@ void rpcrdma_reply_put(struct rpcrdma_buffer *buffers, struct rpcrdma_req *req)
req->rl_reply = NULL;
rpcrdma_rep_put(buffers, rep);
}
+ /* I2: rl_reply NULL after the put closes the
+ * 'rep on rb_free_reps still referenced by req' window.
+ */
+ WARN_ON_ONCE(req->rl_reply);
}
/**
@@ -1435,6 +1439,8 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
rep = rpcrdma_rep_create(r_xprt);
if (!rep)
break;
+ /* I1: a rep on rb_free_reps must carry no rqst pointer. */
+ WARN_ON_ONCE(rep->rr_rqst);
if (!rpcrdma_regbuf_dma_map(r_xprt, rep->rr_rdmabuf)) {
rpcrdma_rep_put(buf, rep);
break;
--
2.54.0
prev parent reply other threads:[~2026-05-26 14:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 14:14 [PATCH v3 0/5] xprtrdma: Decouple req recycling from RPC completion Chuck Lever
2026-05-26 14:14 ` [PATCH v3 1/5] xprtrdma: Use sendctx DMA state for Send signaling Chuck Lever
2026-05-26 14:14 ` [PATCH v3 2/5] xprtrdma: Decouple req recycling from RPC completion Chuck Lever
2026-05-26 14:14 ` [PATCH v3 3/5] xprtrdma: Add request-pool slack for delayed recycling Chuck Lever
2026-05-26 14:14 ` [PATCH v3 4/5] xprtrdma: Clear receive-side ownership pointers on release Chuck Lever
2026-05-26 14:14 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260526141405.39877-6-cel@kernel.org \
--to=cel@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox