[PATCH v2 11/18] svcrdma: Use watermark-based Receive Queue replenishment

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v2 11/18] svcrdma: Use watermark-based Receive Queue replenishment
Date: Fri, 27 Feb 2026 09:03:38 -0500	[thread overview]
Message-ID: <20260227140345.40488-12-cel@kernel.org> (raw)
In-Reply-To: <20260227140345.40488-1-cel@kernel.org>

From: Chuck Lever <chuck.lever@oracle.com>

The current Receive posting strategy posts a small fixed batch of
Receives on every completion when the queue depth drops below the
maximum. At high message rates this results in frequent
ib_post_recv() calls, each incurring doorbell overhead.

The Receive Queue is now provisioned with twice the negotiated
credit limit (sc_max_requests). Replenishment is triggered when the
number of posted Receives drops below the credit limit (the low
watermark), posting enough Receives to refill the queue to capacity.

For a typical configuration with a credit limit of 128:
- Receive Queue depth: 256
- Low watermark: 128 (replenish when half consumed)
- Batch size: ~128 Receives per posting

Tying the watermark to the credit limit rather than a percentage of
queue capacity ensures adequate buffering regardless of the
configured credit limit. Even with a small credit limit, at least
one full credit window remains posted, guaranteeing forward
progress.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          | 23 +++++++++++++-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  | 40 ++++++++++++++++--------
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 13 ++++----
 3 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 562a5f78cd3f..ef52af656581 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -128,7 +128,6 @@ struct svcxprt_rdma {
 
 	/* Receive path */
 	u32		     sc_pending_recvs ____cacheline_aligned_in_smp;
-	u32		     sc_recv_batch;
 	struct llist_head    sc_rq_dto_q;
 	struct llist_head    sc_read_complete_q;
 
@@ -163,6 +162,28 @@ enum {
 	RPCRDMA_MAX_BC_REQUESTS	= 2,
 };
 
+/*
+ * Receive Queue provisioning constants for watermark-based replenishment.
+ *
+ * Queue depth is twice the credit limit to support batched
+ * posting that reduces doorbell overhead. When posted receives
+ * drop below the credit limit (the low watermark),
+ * svc_rdma_wc_receive() posts enough Receives to refill the
+ * queue to capacity.
+ */
+enum {
+	/* Queue depth = sc_max_requests * multiplier */
+	SVCRDMA_RQ_DEPTH_MULT		= 2,
+
+	/* Total recv_ctxt pool = sc_max_requests * multiplier
+	 * (RQ_DEPTH_MULT for posted receives + 1 for RPCs in process)
+	 */
+	SVCRDMA_RECV_CTXT_MULT		= SVCRDMA_RQ_DEPTH_MULT + 1,
+
+	/* rdma_rw contexts per request: Read + Write + Reply chunks */
+	SVCRDMA_RW_CTXT_MULT		= 3,
+};
+
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
 /**
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2281f9adc9f3..a11e845a7113 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -303,10 +303,11 @@ bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
 {
 	unsigned int total;
 
-	/* For each credit, allocate enough recv_ctxts for one
-	 * posted Receive and one RPC in process.
+	/* Allocate enough recv_ctxts for:
+	 * - SVCRDMA_RQ_DEPTH_MULT * sc_max_requests posted on the RQ
+	 * - sc_max_requests RPCs in process
 	 */
-	total = (rdma->sc_max_requests * 2) + rdma->sc_recv_batch;
+	total = rdma->sc_max_requests * SVCRDMA_RECV_CTXT_MULT;
 	while (total--) {
 		struct svc_rdma_recv_ctxt *ctxt;
 
@@ -316,7 +317,8 @@ bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
 		llist_add(&ctxt->rc_node, &rdma->sc_recv_ctxts);
 	}
 
-	return svc_rdma_refresh_recvs(rdma, rdma->sc_max_requests);
+	return svc_rdma_refresh_recvs(rdma,
+				rdma->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT);
 }
 
 /**
@@ -340,18 +342,30 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 		goto flushed;
 	trace_svcrdma_wc_recv(wc, &ctxt->rc_cid);
 
-	/* If receive posting fails, the connection is about to be
-	 * lost anyway. The server will not be able to send a reply
-	 * for this RPC, and the client will retransmit this RPC
-	 * anyway when it reconnects.
+	/* Watermark-based receive posting: The Receive Queue is
+	 * provisioned at SVCRDMA_RQ_DEPTH_MULT times the credit
+	 * count (sc_max_requests). When posted Receives drop below
+	 * sc_max_requests (the low watermark), this handler posts
+	 * enough Receives to refill the queue to capacity.
 	 *
-	 * Therefore we drop the Receive, even if status was SUCCESS
-	 * to reduce the likelihood of replayed requests once the
-	 * client reconnects.
+	 * Batched posting reduces doorbell rate compared to posting
+	 * a fixed small batch on every completion, while keeping
+	 * the Receive Queue populated.
+	 *
+	 * If posting fails, connection teardown is imminent. No
+	 * reply can be sent for this RPC, and the client will
+	 * retransmit after reconnecting. Drop the Receive, even
+	 * if status was SUCCESS, to reduce replay likelihood after
+	 * reconnection.
 	 */
-	if (rdma->sc_pending_recvs < rdma->sc_max_requests)
-		if (!svc_rdma_refresh_recvs(rdma, rdma->sc_recv_batch))
+	if (rdma->sc_pending_recvs < rdma->sc_max_requests) {
+		unsigned int target =
+			(rdma->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT) -
+			rdma->sc_pending_recvs;
+
+		if (!svc_rdma_refresh_recvs(rdma, target))
 			goto dropped;
+	}
 
 	/* All wc fields are now known to be valid */
 	ctxt->rc_byte_len = wc->byte_len;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 719566234277..772f02317895 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -439,7 +439,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = svcrdma_max_requests;
 	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
-	newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
 	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
 
 	/* Qualify the transport's resource defaults with the
@@ -452,12 +451,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_send_sges += (svcrdma_max_req_size / PAGE_SIZE) + 1;
 	if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge)
 		newxprt->sc_max_send_sges = dev->attrs.max_send_sge;
-	rq_depth = newxprt->sc_max_requests + newxprt->sc_max_bc_requests +
-		   newxprt->sc_recv_batch + 1 /* drain */;
+	rq_depth = (newxprt->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT) +
+		   newxprt->sc_max_bc_requests + 1 /* drain */;
 	if (rq_depth > dev->attrs.max_qp_wr) {
+		unsigned int overhead = newxprt->sc_max_bc_requests + 1;
+
 		rq_depth = dev->attrs.max_qp_wr;
-		newxprt->sc_recv_batch = 1;
-		newxprt->sc_max_requests = rq_depth - 2;
+		newxprt->sc_max_requests =
+			(rq_depth - overhead) / SVCRDMA_RQ_DEPTH_MULT;
 		newxprt->sc_max_bc_requests = 2;
 	}
 
@@ -468,7 +469,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	 */
 	maxpayload = min(xprt->xpt_server->sv_max_payload,
 			 RPCSVC_MAXPAYLOAD_RDMA);
-	ctxts = newxprt->sc_max_requests * 3 *
+	ctxts = newxprt->sc_max_requests * SVCRDMA_RW_CTXT_MULT *
 		rdma_rw_mr_factor(dev, newxprt->sc_port_num,
 				  maxpayload >> PAGE_SHIFT);
 
-- 
2.53.0

next prev parent reply	other threads:[~2026-02-27 14:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27 14:03 [PATCH v2 00/18] svcrdma performance scalability enhancements Chuck Lever
2026-02-27 14:03 ` [PATCH v2 01/18] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-27 14:03 ` [PATCH v2 02/18] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-27 14:03 ` [PATCH v2 03/18] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-27 14:03 ` [PATCH v2 04/18] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-27 14:03 ` [PATCH v2 05/18] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-27 14:03 ` [PATCH v2 06/18] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-27 14:03 ` [PATCH v2 07/18] svcrdma: Use lock-free list for Receive Queue tracking Chuck Lever
2026-02-27 14:03 ` [PATCH v2 08/18] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-27 14:03 ` [PATCH v2 09/18] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-27 14:03 ` [PATCH v2 10/18] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever
2026-02-27 14:03 ` Chuck Lever [this message]
2026-02-27 14:03 ` [PATCH v2 12/18] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-27 14:03 ` [PATCH v2 13/18] svcrdma: clear XPT_DATA on sc_read_complete_q consumption Chuck Lever
2026-02-27 14:03 ` [PATCH v2 14/18] svcrdma: retry when receive queues drain transiently Chuck Lever
2026-02-27 14:03 ` [PATCH v2 15/18] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever
2026-02-27 14:03 ` [PATCH v2 16/18] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
2026-02-27 14:03 ` [PATCH v2 17/18] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
2026-02-27 14:03 ` [PATCH v2 18/18] sunrpc: Skip xpt_reserved accounting for non-UDP transports Chuck Lever

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:562a5f78cd3 dfblob:ef52af65658 dfblob:2281f9adc9f
dfblob:a11e845a711 dfblob:71956623427 dfblob:772f0231789 )
 OR (
bs:"[PATCH v2 11/18] svcrdma: Use watermark-based Receive Queue replenishment" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227140345.40488-12-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.