public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH 13/15] svcrdma: clear XPT_DATA on sc_read_complete_q consumption
Date: Tue, 10 Feb 2026 11:32:20 -0500	[thread overview]
Message-ID: <20260210163222.2356793-14-cel@kernel.org> (raw)
In-Reply-To: <20260210163222.2356793-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

svc_rdma_wc_read_done() sets XPT_DATA when adding a
completed RDMA Read context to sc_read_complete_q. The
consumer in svc_rdma_recvfrom() takes the context but
leaves XPT_DATA set. The subsequent svc_xprt_received()
clears XPT_BUSY and re-enqueues the transport; because
XPT_DATA remains set, a second thread awakens. That thread
finds both queues empty, accomplishes nothing, and releases
its slot and reservation.

Trace data from a 256KB NFSv3 WRITE workload over RDMA
shows approximately 14 enqueue attempts per RPC, with 62%
returning immediately due to no pending data. The majority
originate from this spurious dispatch path.

After clearing XPT_DATA to acknowledge consumption, the
XPT_DATA state must be recomputed from both queue states.
A concurrent producer may call llist_add and then
set_bit(XPT_DATA) between this consumer's llist_del_first
and the clear_bit, causing clear_bit to erase the producer's
signal. An smp_mb__after_atomic() barrier after clear_bit
pairs with the implicit barrier in each producer's llist_add
cmpxchg, ensuring llist_empty rechecks observe any add whose
set_bit was erased. This barrier requirement applies at both
call sites: the new sc_read_complete_q path and the
pre-existing sc_rq_dto_q "both queues empty" path.

A new helper svc_rdma_update_xpt_data() centralizes this
clear/barrier/recheck/set pattern to ensure both locations
maintain the required memory ordering.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 33 ++++++++++++++++---------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index b48ef78c79c2..2ee9819a53d7 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -917,6 +917,25 @@ static noinline void svc_rdma_read_complete(struct svc_rqst *rqstp,
 	trace_svcrdma_read_finished(&ctxt->rc_cid);
 }
 
+/*
+ * Recompute XPT_DATA from queue state after consuming a completion. A
+ * concurrent producer may have called llist_add and then set_bit(XPT_DATA)
+ * between this consumer's llist_del_first and the clear_bit below, causing
+ * clear_bit to erase the producer's signal. The barrier pairs with the
+ * implicit barrier in each producer's llist_add so that the llist_empty
+ * rechecks observe any add whose set_bit was erased.
+ */
+static void svc_rdma_update_xpt_data(struct svcxprt_rdma *rdma)
+{
+	struct svc_xprt *xprt = &rdma->sc_xprt;
+
+	clear_bit(XPT_DATA, &xprt->xpt_flags);
+	smp_mb__after_atomic();
+	if (!llist_empty(&rdma->sc_rq_dto_q) ||
+	    !llist_empty(&rdma->sc_read_complete_q))
+		set_bit(XPT_DATA, &xprt->xpt_flags);
+}
+
 /**
  * svc_rdma_recvfrom - Receive an RPC call
  * @rqstp: request structure into which to receive an RPC Call
@@ -965,6 +984,8 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	node = llist_del_first(&rdma_xprt->sc_read_complete_q);
 	if (node) {
 		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
+
+		svc_rdma_update_xpt_data(rdma_xprt);
 		svc_xprt_received(xprt);
 		svc_rdma_read_complete(rqstp, ctxt);
 		goto complete;
@@ -975,17 +996,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	} else {
 		ctxt = NULL;
 		/* No new incoming requests, terminate the loop */
-		clear_bit(XPT_DATA, &xprt->xpt_flags);
-
-		/*
-		 * If a completion arrived after llist_del_first but
-		 * before clear_bit, the producer's set_bit would be
-		 * cleared above. Recheck both queues to close this
-		 * race window.
-		 */
-		if (!llist_empty(&rdma_xprt->sc_rq_dto_q) ||
-		    !llist_empty(&rdma_xprt->sc_read_complete_q))
-			set_bit(XPT_DATA, &xprt->xpt_flags);
+		svc_rdma_update_xpt_data(rdma_xprt);
 	}
 
 	/* Unblock the transport for the next receive */
-- 
2.52.0


  parent reply	other threads:[~2026-02-10 16:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 16:32 [RFC PATCH 00/15] svcrdma performance scalability enhancements Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 01/15] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 02/15] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 03/15] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 04/15] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 05/15] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 06/15] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 07/15] svcrdma: Use lock-free list for Receive Queue tracking Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 08/15] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 09/15] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 10/15] svcrdma: Use per-transport kthread for send context release Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 11/15] svcrdma: Use watermark-based Receive Queue replenishment Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 12/15] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-10 16:32 ` Chuck Lever [this message]
2026-02-10 16:32 ` [RFC PATCH 14/15] svcrdma: retry when receive queues drain transiently Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 15/15] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210163222.2356793-14-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox