From: Chuck Lever <cel@kernel.org>
To: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v2 11/18] svcrdma: Use watermark-based Receive Queue replenishment
Date: Fri, 27 Feb 2026 09:03:38 -0500 [thread overview]
Message-ID: <20260227140345.40488-12-cel@kernel.org> (raw)
In-Reply-To: <20260227140345.40488-1-cel@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
The current Receive posting strategy posts a small fixed batch of
Receives on every completion when the queue depth drops below the
maximum. At high message rates this results in frequent
ib_post_recv() calls, each incurring doorbell overhead.
The Receive Queue is now provisioned with twice the negotiated
credit limit (sc_max_requests). Replenishment is triggered when the
number of posted Receives drops below the credit limit (the low
watermark), posting enough Receives to refill the queue to capacity.
For a typical configuration with a credit limit of 128:
- Receive Queue depth: 256
- Low watermark: 128 (replenish when half consumed)
- Batch size: ~128 Receives per posting
Tying the watermark to the credit limit rather than a percentage of
queue capacity ensures adequate buffering regardless of the
configured credit limit. Even with a small credit limit, at least
one full credit window remains posted, guaranteeing forward
progress.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc_rdma.h | 23 +++++++++++++-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 40 ++++++++++++++++--------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 13 ++++----
3 files changed, 56 insertions(+), 20 deletions(-)
diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 562a5f78cd3f..ef52af656581 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -128,7 +128,6 @@ struct svcxprt_rdma {
/* Receive path */
u32 sc_pending_recvs ____cacheline_aligned_in_smp;
- u32 sc_recv_batch;
struct llist_head sc_rq_dto_q;
struct llist_head sc_read_complete_q;
@@ -163,6 +162,28 @@ enum {
RPCRDMA_MAX_BC_REQUESTS = 2,
};
+/*
+ * Receive Queue provisioning constants for watermark-based replenishment.
+ *
+ * Queue depth is twice the credit limit to support batched
+ * posting that reduces doorbell overhead. When posted receives
+ * drop below the credit limit (the low watermark),
+ * svc_rdma_wc_receive() posts enough Receives to refill the
+ * queue to capacity.
+ */
+enum {
+ /* Queue depth = sc_max_requests * multiplier */
+ SVCRDMA_RQ_DEPTH_MULT = 2,
+
+ /* Total recv_ctxt pool = sc_max_requests * multiplier
+ * (RQ_DEPTH_MULT for posted receives + 1 for RPCs in process)
+ */
+ SVCRDMA_RECV_CTXT_MULT = SVCRDMA_RQ_DEPTH_MULT + 1,
+
+ /* rdma_rw contexts per request: Read + Write + Reply chunks */
+ SVCRDMA_RW_CTXT_MULT = 3,
+};
+
#define RPCSVC_MAXPAYLOAD_RDMA RPCSVC_MAXPAYLOAD
/**
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2281f9adc9f3..a11e845a7113 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -303,10 +303,11 @@ bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
{
unsigned int total;
- /* For each credit, allocate enough recv_ctxts for one
- * posted Receive and one RPC in process.
+ /* Allocate enough recv_ctxts for:
+ * - SVCRDMA_RQ_DEPTH_MULT * sc_max_requests posted on the RQ
+ * - sc_max_requests RPCs in process
*/
- total = (rdma->sc_max_requests * 2) + rdma->sc_recv_batch;
+ total = rdma->sc_max_requests * SVCRDMA_RECV_CTXT_MULT;
while (total--) {
struct svc_rdma_recv_ctxt *ctxt;
@@ -316,7 +317,8 @@ bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
llist_add(&ctxt->rc_node, &rdma->sc_recv_ctxts);
}
- return svc_rdma_refresh_recvs(rdma, rdma->sc_max_requests);
+ return svc_rdma_refresh_recvs(rdma,
+ rdma->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT);
}
/**
@@ -340,18 +342,30 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
goto flushed;
trace_svcrdma_wc_recv(wc, &ctxt->rc_cid);
- /* If receive posting fails, the connection is about to be
- * lost anyway. The server will not be able to send a reply
- * for this RPC, and the client will retransmit this RPC
- * anyway when it reconnects.
+ /* Watermark-based receive posting: The Receive Queue is
+ * provisioned at SVCRDMA_RQ_DEPTH_MULT times the credit
+ * count (sc_max_requests). When posted Receives drop below
+ * sc_max_requests (the low watermark), this handler posts
+ * enough Receives to refill the queue to capacity.
*
- * Therefore we drop the Receive, even if status was SUCCESS
- * to reduce the likelihood of replayed requests once the
- * client reconnects.
+ * Batched posting reduces doorbell rate compared to posting
+ * a fixed small batch on every completion, while keeping
+ * the Receive Queue populated.
+ *
+ * If posting fails, connection teardown is imminent. No
+ * reply can be sent for this RPC, and the client will
+ * retransmit after reconnecting. Drop the Receive, even
+ * if status was SUCCESS, to reduce replay likelihood after
+ * reconnection.
*/
- if (rdma->sc_pending_recvs < rdma->sc_max_requests)
- if (!svc_rdma_refresh_recvs(rdma, rdma->sc_recv_batch))
+ if (rdma->sc_pending_recvs < rdma->sc_max_requests) {
+ unsigned int target =
+ (rdma->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT) -
+ rdma->sc_pending_recvs;
+
+ if (!svc_rdma_refresh_recvs(rdma, target))
goto dropped;
+ }
/* All wc fields are now known to be valid */
ctxt->rc_byte_len = wc->byte_len;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 719566234277..772f02317895 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -439,7 +439,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
newxprt->sc_max_req_size = svcrdma_max_req_size;
newxprt->sc_max_requests = svcrdma_max_requests;
newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
- newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
/* Qualify the transport's resource defaults with the
@@ -452,12 +451,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
newxprt->sc_max_send_sges += (svcrdma_max_req_size / PAGE_SIZE) + 1;
if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge)
newxprt->sc_max_send_sges = dev->attrs.max_send_sge;
- rq_depth = newxprt->sc_max_requests + newxprt->sc_max_bc_requests +
- newxprt->sc_recv_batch + 1 /* drain */;
+ rq_depth = (newxprt->sc_max_requests * SVCRDMA_RQ_DEPTH_MULT) +
+ newxprt->sc_max_bc_requests + 1 /* drain */;
if (rq_depth > dev->attrs.max_qp_wr) {
+ unsigned int overhead = newxprt->sc_max_bc_requests + 1;
+
rq_depth = dev->attrs.max_qp_wr;
- newxprt->sc_recv_batch = 1;
- newxprt->sc_max_requests = rq_depth - 2;
+ newxprt->sc_max_requests =
+ (rq_depth - overhead) / SVCRDMA_RQ_DEPTH_MULT;
newxprt->sc_max_bc_requests = 2;
}
@@ -468,7 +469,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
*/
maxpayload = min(xprt->xpt_server->sv_max_payload,
RPCSVC_MAXPAYLOAD_RDMA);
- ctxts = newxprt->sc_max_requests * 3 *
+ ctxts = newxprt->sc_max_requests * SVCRDMA_RW_CTXT_MULT *
rdma_rw_mr_factor(dev, newxprt->sc_port_num,
maxpayload >> PAGE_SHIFT);
--
2.53.0
next prev parent reply other threads:[~2026-02-27 14:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 14:03 [PATCH v2 00/18] svcrdma performance scalability enhancements Chuck Lever
2026-02-27 14:03 ` [PATCH v2 01/18] svcrdma: Add fair queuing for Send Queue access Chuck Lever
2026-02-27 14:03 ` [PATCH v2 02/18] svcrdma: Clean up use of rdma->sc_pd->device in Receive paths Chuck Lever
2026-02-27 14:03 ` [PATCH v2 03/18] svcrdma: Clean up use of rdma->sc_pd->device Chuck Lever
2026-02-27 14:03 ` [PATCH v2 04/18] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever
2026-02-27 14:03 ` [PATCH v2 05/18] svcrdma: Factor out WR chain linking into helper Chuck Lever
2026-02-27 14:03 ` [PATCH v2 06/18] svcrdma: Reduce false sharing in struct svcxprt_rdma Chuck Lever
2026-02-27 14:03 ` [PATCH v2 07/18] svcrdma: Use lock-free list for Receive Queue tracking Chuck Lever
2026-02-27 14:03 ` [PATCH v2 08/18] svcrdma: Convert Read completion queue to use lock-free list Chuck Lever
2026-02-27 14:03 ` [PATCH v2 09/18] svcrdma: Release write chunk resources without re-queuing Chuck Lever
2026-02-27 14:03 ` [PATCH v2 10/18] svcrdma: Defer send context release to xpo_release_ctxt Chuck Lever
2026-02-27 14:03 ` Chuck Lever [this message]
2026-02-27 14:03 ` [PATCH v2 12/18] svcrdma: Add per-recv_ctxt chunk context cache Chuck Lever
2026-02-27 14:03 ` [PATCH v2 13/18] svcrdma: clear XPT_DATA on sc_read_complete_q consumption Chuck Lever
2026-02-27 14:03 ` [PATCH v2 14/18] svcrdma: retry when receive queues drain transiently Chuck Lever
2026-02-27 14:03 ` [PATCH v2 15/18] svcrdma: clear XPT_DATA on sc_rq_dto_q consumption Chuck Lever
2026-02-27 14:03 ` [PATCH v2 16/18] sunrpc: skip svc_xprt_enqueue when no work is pending Chuck Lever
2026-02-27 14:03 ` [PATCH v2 17/18] sunrpc: skip svc_xprt_enqueue in svc_xprt_received when idle Chuck Lever
2026-02-27 14:03 ` [PATCH v2 18/18] sunrpc: Skip xpt_reserved accounting for non-UDP transports Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260227140345.40488-12-cel@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@ownmail.net \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox