All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v2 13/18] svcrdma: clear XPT_DATA on sc_read_complete_q consumption
Date: Fri, 27 Feb 2026 09:03:40 -0500	[thread overview]
Message-ID: <20260227140345.40488-14-cel@kernel.org> (raw)
In-Reply-To: <20260227140345.40488-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

svc_rdma_wc_read_done() sets XPT_DATA when adding a
completed RDMA Read context to sc_read_complete_q. The
consumer in svc_rdma_recvfrom() takes the context but
leaves XPT_DATA set. The subsequent svc_xprt_received()
clears XPT_BUSY and re-enqueues the transport; because
XPT_DATA remains set, a second thread awakens. That thread
finds both queues empty, accomplishes nothing, and releases
its slot and reservation.

Trace data from a 256KB NFSv3 WRITE workload over RDMA
shows approximately 14 enqueue attempts per RPC, with 62%
returning immediately due to no pending data. The majority
originate from this spurious dispatch path.

After clearing XPT_DATA to acknowledge consumption, the
XPT_DATA state must be recomputed from both queue states.
A concurrent producer may call llist_add and then
set_bit(XPT_DATA) between this consumer's llist_del_first
and the clear_bit, causing clear_bit to erase the producer's
signal. An smp_mb__after_atomic() barrier after clear_bit
pairs with the implicit barrier in each producer's llist_add
cmpxchg, ensuring llist_empty rechecks observe any add whose
set_bit was erased. This barrier requirement applies at both
call sites: the new sc_read_complete_q path and the
pre-existing sc_rq_dto_q "both queues empty" path.

A new helper svc_rdma_update_xpt_data() centralizes this
clear/barrier/recheck/set pattern to ensure both locations
maintain the required memory ordering.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 33 ++++++++++++++++---------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 45edf57c7285..54545fcd8762 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -917,6 +917,25 @@ static noinline void svc_rdma_read_complete(struct svc_rqst *rqstp,
 	trace_svcrdma_read_finished(&ctxt->rc_cid);
 }
 
+/*
+ * Recompute XPT_DATA from queue state after consuming a completion. A
+ * concurrent producer may have called llist_add and then set_bit(XPT_DATA)
+ * between this consumer's llist_del_first and the clear_bit below, causing
+ * clear_bit to erase the producer's signal. The barrier pairs with the
+ * implicit barrier in each producer's llist_add so that the llist_empty
+ * rechecks observe any add whose set_bit was erased.
+ */
+static void svc_rdma_update_xpt_data(struct svcxprt_rdma *rdma)
+{
+	struct svc_xprt *xprt = &rdma->sc_xprt;
+
+	clear_bit(XPT_DATA, &xprt->xpt_flags);
+	smp_mb__after_atomic();
+	if (!llist_empty(&rdma->sc_rq_dto_q) ||
+	    !llist_empty(&rdma->sc_read_complete_q))
+		set_bit(XPT_DATA, &xprt->xpt_flags);
+}
+
 /**
  * svc_rdma_recvfrom - Receive an RPC call
  * @rqstp: request structure into which to receive an RPC Call
@@ -965,6 +984,8 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	node = llist_del_first(&rdma_xprt->sc_read_complete_q);
 	if (node) {
 		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
+
+		svc_rdma_update_xpt_data(rdma_xprt);
 		svc_xprt_received(xprt);
 		svc_rdma_read_complete(rqstp, ctxt);
 		goto complete;
@@ -975,17 +996,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	} else {
 		ctxt = NULL;
 		/* No new incoming requests, terminate the loop */
-		clear_bit(XPT_DATA, &xprt->xpt_flags);
-
-		/*
-		 * If a completion arrived after llist_del_first but
-		 * before clear_bit, the producer's set_bit would be
-		 * cleared above. Recheck both queues to close this
-		 * race window.
-		 */
-		if (!llist_empty(&rdma_xprt->sc_rq_dto_q) ||
-		    !llist_empty(&rdma_xprt->sc_read_complete_q))
-			set_bit(XPT_DATA, &xprt->xpt_flags);
+		svc_rdma_update_xpt_data(rdma_xprt);
 	}
 
 	/* Unblock the transport for the next receive */
-- 
2.53.0


  parent reply	other threads:[~2026-02-27 14:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27 14:03 [PATCH v2 00/18] svcrdma performance scalability enhancements Chuck Lever
2026-02-27 14:03 ` [PATCH v2 01/18] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-27 14:03 ` [PATCH v2 02/18] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-27 14:03 ` [PATCH v2 03/18] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-27 14:03 ` [PATCH v2 04/18] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-27 14:03 ` [PATCH v2 05/18] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-27 14:03 ` [PATCH v2 06/18] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-27 14:03 ` [PATCH v2 07/18] svcrdma: Use lock-free list for Receive Queue tracking Chuck Lever
2026-02-27 14:03 ` [PATCH v2 08/18] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-27 14:03 ` [PATCH v2 09/18] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-27 14:03 ` [PATCH v2 10/18] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever
2026-02-27 14:03 ` [PATCH v2 11/18] svcrdma: Use watermark-based Receive Queue replenishment Chuck Lever
2026-02-27 14:03 ` [PATCH v2 12/18] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-27 14:03 ` Chuck Lever [this message]
2026-02-27 14:03 ` [PATCH v2 14/18] svcrdma: retry when receive queues drain transiently Chuck Lever
2026-02-27 14:03 ` [PATCH v2 15/18] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever
2026-02-27 14:03 ` [PATCH v2 16/18] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
2026-02-27 14:03 ` [PATCH v2 17/18] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
2026-02-27 14:03 ` [PATCH v2 18/18] sunrpc: Skip xpt_reserved accounting for non-UDP transports Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227140345.40488-14-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.