public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH 14/15] svcrdma: retry when receive queues drain transiently
Date: Tue, 10 Feb 2026 11:32:21 -0500	[thread overview]
Message-ID: <20260210163222.2356793-15-cel@kernel.org> (raw)
In-Reply-To: <20260210163222.2356793-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

When svc_rdma_recvfrom finds both sc_read_complete_q
and sc_rq_dto_q empty, svc_rdma_update_xpt_data clears
XPT_DATA, executes a barrier, and rechecks the queues.
If a completion arrived between the llist_del_first and
the recheck, XPT_DATA is re-set, but recvfrom returns
zero regardless. The thread then traverses the full
svc_recv cycle -- page allocation, dequeue, recvfrom,
release -- only to find the item that was already
available at the time of the recheck.

Trace data from a 256KB NFSv3 workload over RDMA shows
267,848 of 464,355 transport dequeues (57.7%) are these
empty bounces. Each bounce costs roughly 37 us. During
the READ phase, empty bounces consume 8.6% of thread
capacity and inflate inter-RPC gaps by an average of
87 us.

The calling thread holds XPT_BUSY for the duration, so
no other consumer can drain the queue between the
recheck and the retry. A retry is therefore guaranteed
to find data on its first iteration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2ee9819a53d7..a124c6ed057a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -981,6 +981,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 
 	rqstp->rq_xprt_ctxt = NULL;
 
+retry:
 	node = llist_del_first(&rdma_xprt->sc_read_complete_q);
 	if (node) {
 		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
@@ -995,8 +996,17 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
 	} else {
 		ctxt = NULL;
-		/* No new incoming requests, terminate the loop */
 		svc_rdma_update_xpt_data(rdma_xprt);
+		/*
+		 * A completion may have arrived between the
+		 * llist_del_first above and the queue recheck
+		 * inside svc_rdma_update_xpt_data. This thread
+		 * holds XPT_BUSY, preventing any other consumer
+		 * from draining the queue in the meantime.
+		 * Retry to avoid a full svc_recv round-trip.
+		 */
+		if (test_bit(XPT_DATA, &xprt->xpt_flags))
+			goto retry;
 	}
 
 	/* Unblock the transport for the next receive */
-- 
2.52.0


  parent reply	other threads:[~2026-02-10 16:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 16:32 [RFC PATCH 00/15] svcrdma performance scalability enhancements Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 01/15] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 02/15] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 03/15] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 04/15] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 05/15] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 06/15] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 07/15] svcrdma: Use lock-free list for Receive Queue tracking Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 08/15] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 09/15] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 10/15] svcrdma: Use per-transport kthread for send context release Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 11/15] svcrdma: Use watermark-based Receive Queue replenishment Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 12/15] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 13/15] svcrdma: clear XPT_DATA on sc_read_complete_q consumption Chuck Lever
2026-02-10 16:32 ` Chuck Lever [this message]
2026-02-10 16:32 ` [RFC PATCH 15/15] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210163222.2356793-15-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox