From: Christoph Hellwig <hch@infradead.org>
To: cel@kernel.org
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Anna Schumaker <anna@kernel.org>,
linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
Chuck Lever <chuck.lever@oracle.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>
Subject: Re: [PATCH v4 01/14] svcrdma: Reduce the number of rdma_rw contexts per-QP
Date: Tue, 6 May 2025 06:08:59 -0700 [thread overview]
Message-ID: <aBoJ64qDSp7U3twh@infradead.org> (raw)
In-Reply-To: <20250428193702.5186-2-cel@kernel.org>
On Mon, Apr 28, 2025 at 03:36:49PM -0400, cel@kernel.org wrote:
> qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
> the order of the sum of qp_attr.cap.max_send_wr and a factor times
> qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
> on whether MR operations are required before RDMA Reads.
>
> This limit is not visible to RDMA consumers via dev->attrs. When the
> limit is surpassed, QP creation fails with -ENOMEM. For example:
Can we find a way to expose this limit from the HCA drivers and the
RDMA core?
Having to guess it in ULP feels rather cumbersome.
In the meantime this patch looks good to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> svcrdma's estimate of the number of rdma_rw contexts it needs is
> three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES
> is about 260, the internally-computed SQ length should be:
>
> 64 credits + 10 backlog + 3 * (3 * 260) = 2414
>
> Which is well below the advertised qp_max_wr of 32768.
>
> If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages:
>
> 64 credits + 10 backlog + 3 * (3 * 1040) = 9434
>
> However, QP creation fails. Dynamic printk for mlx5 shows:
>
> calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768)
>
> Although 9326 is still far below qp_max_wr, QP creation still
> fails.
>
> Because the total SQ length calculation is opaque to RDMA consumers,
> there doesn't seem to be much that can be done about this except for
> consumers to try to keep the requested rdma_rw ctxt count low.
>
> Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 5940a56023d1..3d7f1413df02 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -406,12 +406,12 @@ static void svc_rdma_xprt_done(struct rpcrdma_notification *rn)
> */
> static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> {
> + unsigned int ctxts, rq_depth, maxpayload;
> struct svcxprt_rdma *listen_rdma;
> struct svcxprt_rdma *newxprt = NULL;
> struct rdma_conn_param conn_param;
> struct rpcrdma_connect_private pmsg;
> struct ib_qp_init_attr qp_attr;
> - unsigned int ctxts, rq_depth;
> struct ib_device *dev;
> int ret = 0;
> RPC_IFDEBUG(struct sockaddr *sap);
> @@ -462,12 +462,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> newxprt->sc_max_bc_requests = 2;
> }
>
> - /* Arbitrarily estimate the number of rw_ctxs needed for
> - * this transport. This is enough rw_ctxs to make forward
> - * progress even if the client is using one rkey per page
> - * in each Read chunk.
> + /* Arbitrary estimate of the needed number of rdma_rw contexts.
> */
> - ctxts = 3 * RPCSVC_MAXPAGES;
> + maxpayload = min(xprt->xpt_server->sv_max_payload,
> + RPCSVC_MAXPAYLOAD_RDMA);
> + ctxts = newxprt->sc_max_requests * 3 *
> + rdma_rw_mr_factor(dev, newxprt->sc_port_num,
> + maxpayload >> PAGE_SHIFT);
> +
> newxprt->sc_sq_depth = rq_depth + ctxts;
> if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
> newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
> --
> 2.49.0
>
>
---end quoted text---
next prev parent reply other threads:[~2025-05-06 13:09 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 19:36 [PATCH v4 00/14] Allocate payload arrays dynamically cel
2025-04-28 19:36 ` [PATCH v4 01/14] svcrdma: Reduce the number of rdma_rw contexts per-QP cel
2025-05-06 13:08 ` Christoph Hellwig [this message]
2025-05-06 13:17 ` Jason Gunthorpe
2025-05-06 13:40 ` Christoph Hellwig
2025-05-06 13:55 ` Jason Gunthorpe
2025-05-06 14:13 ` Chuck Lever
2025-05-06 14:17 ` Jason Gunthorpe
2025-05-06 14:19 ` Chuck Lever
2025-05-06 14:22 ` Jason Gunthorpe
2025-05-08 8:41 ` Edward Srouji
2025-05-08 12:43 ` Jason Gunthorpe
2025-05-10 23:12 ` Edward Srouji
2025-04-28 19:36 ` [PATCH v4 02/14] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-05-06 13:10 ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 03/14] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-05-06 13:11 ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 04/14] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-04-30 4:53 ` NeilBrown
2025-04-28 19:36 ` [PATCH v4 05/14] sunrpc: Replace the rq_vec " cel
2025-05-06 13:29 ` Christoph Hellwig
2025-05-06 16:31 ` Chuck Lever
2025-05-07 7:34 ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 06/14] sunrpc: Replace the rq_bvec " cel
2025-04-28 19:36 ` [PATCH v4 07/14] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-04-28 19:36 ` [PATCH v4 08/14] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-05-06 13:31 ` Christoph Hellwig
2025-05-06 15:20 ` Chuck Lever
2025-05-07 7:40 ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 09/14] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-04-28 19:36 ` [PATCH v4 10/14] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-04-28 19:36 ` [PATCH v4 11/14] NFSD: Remove NFSD_BUFSIZE cel
2025-04-28 21:03 ` Jeff Layton
2025-05-06 13:32 ` Christoph Hellwig
2025-04-28 19:37 ` [PATCH v4 12/14] NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro cel
2025-05-06 13:33 ` Christoph Hellwig
2025-04-28 19:37 ` [PATCH v4 13/14] NFSD: Add a "default" block size cel
2025-04-28 21:07 ` Jeff Layton
2025-04-28 19:37 ` [PATCH v4 14/14] SUNRPC: Bump the maximum payload size for the server cel
2025-04-28 21:08 ` Jeff Layton
2025-04-29 15:44 ` Chuck Lever
2025-05-06 13:34 ` Christoph Hellwig
2025-05-06 13:52 ` Chuck Lever
2025-05-06 13:54 ` Jeff Layton
2025-05-06 13:59 ` Chuck Lever
2025-05-07 7:42 ` Christoph Hellwig
2025-05-07 14:25 ` Chuck Lever
2025-04-29 13:06 ` [PATCH v4 00/14] Allocate payload arrays dynamically Zhu Yanjun
2025-04-29 13:41 ` Chuck Lever
2025-04-29 13:52 ` Zhu Yanjun
2025-04-30 5:11 ` NeilBrown
2025-04-30 12:45 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBoJ64qDSp7U3twh@infradead.org \
--to=hch@infradead.org \
--cc=anna@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jgg@ziepe.ca \
--cc=jlayton@kernel.org \
--cc=leon@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox