public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH 07/15] svcrdma: Use lock-free list for Receive Queue tracking
Date: Tue, 10 Feb 2026 11:32:14 -0500	[thread overview]
Message-ID: <20260210163222.2356793-8-cel@kernel.org> (raw)
In-Reply-To: <20260210163222.2356793-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

The sc_rq_dto_lock spinlock is acquired on every receive completion
to add the completed receive context to the sc_rq_dto_q list. Under
high message rates this creates contention between softirq contexts
processing completions.

Replace sc_rq_dto_q with a lock-free llist. Receive completions now
use llist_add() which requires no locking. The consumer uses
llist_del_first() to retrieve one item at a time.

The lock remains for sc_read_complete_q, but the primary hot path
(receive completion and consumption) no longer requires it. This
eliminates producer-side contention entirely.

Note that llist provides LIFO ordering rather than FIFO. For
independent RPC requests this has no semantic impact and avoids
the overhead of reversing the list on the consumer side.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  | 38 +++++++++++++++++-------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 972d446439a6..ae4300364491 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -109,7 +109,7 @@ struct svcxprt_rdma {
 	/* Receive path */
 	u32		     sc_pending_recvs ____cacheline_aligned_in_smp;
 	u32		     sc_recv_batch;
-	struct list_head     sc_rq_dto_q;
+	struct llist_head    sc_rq_dto_q;
 	struct list_head     sc_read_complete_q;
 	spinlock_t	     sc_rq_dto_lock;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 29a71fa79e2b..fd4d3fbd7054 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -361,11 +361,12 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 	/* All wc fields are now known to be valid */
 	ctxt->rc_byte_len = wc->byte_len;
 
-	spin_lock(&rdma->sc_rq_dto_lock);
-	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
-	/* Note the unlock pairs with the smp_rmb in svc_xprt_ready: */
+	llist_add(&ctxt->rc_node, &rdma->sc_rq_dto_q);
+	/*
+	 * The implicit barrier of llist_add's cmpxchg pairs with
+	 * the smp_rmb in svc_xprt_ready.
+	 */
 	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
-	spin_unlock(&rdma->sc_rq_dto_lock);
 	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
 		svc_xprt_enqueue(&rdma->sc_xprt);
 	return;
@@ -388,13 +389,16 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_recv_ctxt *ctxt;
+	struct llist_node *node;
 
 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_read_complete_q))) {
 		list_del(&ctxt->rc_list);
 		svc_rdma_recv_ctxt_put(rdma, ctxt);
 	}
-	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_rq_dto_q))) {
-		list_del(&ctxt->rc_list);
+	node = llist_del_all(&rdma->sc_rq_dto_q);
+	while (node) {
+		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
+		node = node->next;
 		svc_rdma_recv_ctxt_put(rdma, ctxt);
 	}
 }
@@ -930,6 +934,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	struct svcxprt_rdma *rdma_xprt =
 		container_of(xprt, struct svcxprt_rdma, sc_xprt);
 	struct svc_rdma_recv_ctxt *ctxt;
+	struct llist_node *node;
 	int ret;
 
 	/* Prevent svc_xprt_release() from releasing pages in rq_pages
@@ -949,13 +954,24 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		svc_rdma_read_complete(rqstp, ctxt);
 		goto complete;
 	}
-	ctxt = svc_rdma_next_recv_ctxt(&rdma_xprt->sc_rq_dto_q);
-	if (ctxt)
-		list_del(&ctxt->rc_list);
-	else
+	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
+
+	node = llist_del_first(&rdma_xprt->sc_rq_dto_q);
+	if (node) {
+		ctxt = llist_entry(node, struct svc_rdma_recv_ctxt, rc_node);
+	} else {
+		ctxt = NULL;
 		/* No new incoming requests, terminate the loop */
 		clear_bit(XPT_DATA, &xprt->xpt_flags);
-	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
+
+		/*
+		 * If a completion arrived after llist_del_first but
+		 * before clear_bit, the producer's set_bit would be
+		 * cleared above. Recheck to close this race window.
+		 */
+		if (!llist_empty(&rdma_xprt->sc_rq_dto_q))
+			set_bit(XPT_DATA, &xprt->xpt_flags);
+	}
 
 	/* Unblock the transport for the next receive */
 	svc_xprt_received(xprt);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index d1dcffbf2fe7..e7f8898d09db 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -173,7 +173,7 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 
 	svc_xprt_init(net, &svc_rdma_class, &cma_xprt->sc_xprt, serv);
 	INIT_LIST_HEAD(&cma_xprt->sc_accept_q);
-	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
+	init_llist_head(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	init_llist_head(&cma_xprt->sc_send_ctxts);
 	init_llist_head(&cma_xprt->sc_recv_ctxts);
-- 
2.52.0


  parent reply	other threads:[~2026-02-10 16:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 16:32 [RFC PATCH 00/15] svcrdma performance scalability enhancements Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 01/15] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 02/15] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 03/15] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 04/15] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 05/15] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 06/15] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-10 16:32 ` Chuck Lever [this message]
2026-02-10 16:32 ` [RFC PATCH 08/15] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 09/15] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 10/15] svcrdma: Use per-transport kthread for send context release Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 11/15] svcrdma: Use watermark-based Receive Queue replenishment Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 12/15] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 13/15] svcrdma: clear XPT_DATA on sc_read_complete_q consumption Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 14/15] svcrdma: retry when receive queues drain transiently Chuck Lever
2026-02-10 16:32 ` [RFC PATCH 15/15] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210163222.2356793-8-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox