From: Chuck Lever <cel@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Christoph Hellwig <hch@lst.de>
Cc: NeilBrown <neilb@ownmail.net>, Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
<linux-rdma@vger.kernel.org>, <linux-nfs@vger.kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v3 4/5] RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
Date: Thu, 22 Jan 2026 17:04:00 -0500 [thread overview]
Message-ID: <20260122220401.1143331-5-cel@kernel.org> (raw)
In-Reply-To: <20260122220401.1143331-1-cel@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the
number of rdma_rw contexts (ctxts). This value is used to allocate the
Send CQ and to initialize the sc_sq_avail credit pool.
However, when the device uses memory registration for RDMA operations,
rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three
per context to account for REG and INV work requests. The Send CQ and
credit pool remain sized for only one work request per context,
causing Send Queue exhaustion under heavy NFS WRITE workloads.
Introduce rdma_rw_max_sge() to compute the actual number of Send Queue
entries required for a given number of rdma_rw contexts. Upper layer
protocols call this helper before creating a Queue Pair so that their
Send CQs and credit accounting match the QP's true capacity.
Update svc_rdma_accept() to use rdma_rw_max_sge() when computing
sc_sq_depth, ensuring the credit pool reflects the work requests
that rdma_rw_init_qp() will reserve.
Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
drivers/infiniband/core/rw.c | 53 +++++++++++++++++-------
include/rdma/rw.h | 2 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 +++-
3 files changed, 46 insertions(+), 17 deletions(-)
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 3a00b788417d..4bca8a8ab695 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -1099,34 +1099,57 @@ unsigned int rdma_rw_mr_factor(struct ib_device *device, u32 port_num,
}
EXPORT_SYMBOL(rdma_rw_mr_factor);
+/**
+ * rdma_rw_max_send_wr - compute max Send WRs needed for RDMA R/W contexts
+ * @dev: RDMA device
+ * @port_num: port number
+ * @max_rdma_ctxs: number of rdma_rw_ctx structures
+ * @create_flags: QP create flags (pass IB_QP_CREATE_INTEGRITY_EN if
+ * data integrity will be enabled on the QP)
+ *
+ * Returns the total number of Send Queue entries needed for
+ * @max_rdma_ctxs. The result accounts for memory registration and
+ * invalidation work requests when the device requires them.
+ *
+ * ULPs use this to size Send Queues and Send CQs before creating a
+ * Queue Pair.
+ */
+unsigned int rdma_rw_max_send_wr(struct ib_device *dev, u32 port_num,
+ unsigned int max_rdma_ctxs, u32 create_flags)
+{
+ unsigned int factor = 1;
+ unsigned int result;
+
+ if (create_flags & IB_QP_CREATE_INTEGRITY_EN ||
+ rdma_rw_can_use_mr(dev, port_num))
+ factor += 2; /* reg + inv */
+
+ if (check_mul_overflow(factor, max_rdma_ctxs, &result))
+ return UINT_MAX;
+ return result;
+}
+EXPORT_SYMBOL(rdma_rw_max_send_wr);
+
void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
{
- u32 factor;
+ unsigned int factor = 1;
WARN_ON_ONCE(attr->port_num == 0);
/*
- * Each context needs at least one RDMA READ or WRITE WR.
- *
- * For some hardware we might need more, eventually we should ask the
- * HCA driver for a multiplier here.
- */
- factor = 1;
-
- /*
- * If the device needs MRs to perform RDMA READ or WRITE operations,
- * we'll need two additional MRs for the registrations and the
- * invalidation.
+ * If the device uses MRs to perform RDMA READ or WRITE operations,
+ * or if data integrity is enabled, account for registration and
+ * invalidation work requests.
*/
if (attr->create_flags & IB_QP_CREATE_INTEGRITY_EN ||
rdma_rw_can_use_mr(dev, attr->port_num))
- factor += 2; /* inv + reg */
+ factor += 2; /* reg + inv */
attr->cap.max_send_wr += factor * attr->cap.max_rdma_ctxs;
/*
- * But maybe we were just too high in the sky and the device doesn't
- * even support all we need, and we'll have to live with what we get..
+ * The device might not support all we need, and we'll have to
+ * live with what we get.
*/
attr->cap.max_send_wr =
min_t(u32, attr->cap.max_send_wr, dev->attrs.max_qp_wr);
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
index 53ed0f05fa25..5f96ff754be7 100644
--- a/include/rdma/rw.h
+++ b/include/rdma/rw.h
@@ -88,6 +88,8 @@ int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num,
unsigned int rdma_rw_mr_factor(struct ib_device *device, u32 port_num,
unsigned int maxpages);
+unsigned int rdma_rw_max_send_wr(struct ib_device *dev, u32 port_num,
+ unsigned int max_rdma_ctxs, u32 create_flags);
void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
void rdma_rw_cleanup_mrs(struct ib_qp *qp);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index b7b318ad25c4..9b623849723e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -462,7 +462,10 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
newxprt->sc_max_bc_requests = 2;
}
- /* Arbitrary estimate of the needed number of rdma_rw contexts.
+ /* Estimate the needed number of rdma_rw contexts. The maximum
+ * Read and Write chunks have one segment each. Each request
+ * can involve one Read chunk and either a Write chunk or Reply
+ * chunk; thus a factor of three.
*/
maxpayload = min(xprt->xpt_server->sv_max_payload,
RPCSVC_MAXPAYLOAD_RDMA);
@@ -470,7 +473,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
rdma_rw_mr_factor(dev, newxprt->sc_port_num,
maxpayload >> PAGE_SHIFT);
- newxprt->sc_sq_depth = rq_depth + ctxts;
+ newxprt->sc_sq_depth = rq_depth +
+ rdma_rw_max_send_wr(dev, newxprt->sc_port_num, ctxts, 0);
if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
--
2.52.0
next prev parent reply other threads:[~2026-01-22 22:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 22:03 [PATCH v3 0/5] Add a bio_vec based API to core/rw.c Chuck Lever
2026-01-22 22:03 ` [PATCH v3 1/5] RDMA/core: add bio_vec based RDMA read/write API Chuck Lever
2026-01-23 6:26 ` Christoph Hellwig
2026-01-22 22:03 ` [PATCH v3 2/5] RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations Chuck Lever
2026-01-23 6:28 ` Christoph Hellwig
2026-01-23 15:04 ` Chuck Lever
2026-01-26 6:14 ` Christoph Hellwig
2026-01-22 22:03 ` [PATCH v3 3/5] RDMA/core: add MR support for bvec-based " Chuck Lever
2026-01-23 6:36 ` Christoph Hellwig
2026-01-23 15:06 ` Chuck Lever
2026-01-26 6:17 ` Christoph Hellwig
2026-01-26 16:48 ` Chuck Lever
2026-01-23 16:47 ` Chuck Lever
2026-01-26 6:16 ` Christoph Hellwig
2026-01-22 22:04 ` Chuck Lever [this message]
2026-01-23 6:36 ` [PATCH v3 4/5] RDMA/core: add rdma_rw_max_sge() helper for SQ sizing Christoph Hellwig
2026-01-22 22:04 ` [PATCH v3 5/5] svcrdma: use bvec-based RDMA read/write API Chuck Lever
2026-01-23 6:04 ` [PATCH v3 0/5] Add a bio_vec based API to core/rw.c Zhu Yanjun
2026-01-23 14:13 ` Chuck Lever
2026-01-24 18:19 ` Zhu Yanjun
2026-01-26 17:13 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260122220401.1143331-5-cel@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=jlayton@kernel.org \
--cc=leon@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=neilb@ownmail.net \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox