From: Jeff Layton <jlayton@kernel.org>
To: cel@kernel.org, NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
Chuck Lever <chuck.lever@oracle.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v5 01/19] svcrdma: Reduce the number of rdma_rw contexts per-QP
Date: Tue, 13 May 2025 07:00:54 -0500 [thread overview]
Message-ID: <f3d7a799ee28c821db1fb946640c270a89690174.camel@kernel.org> (raw)
In-Reply-To: <20250509190354.5393-2-cel@kernel.org>
On Fri, 2025-05-09 at 15:03 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> There is an upper bound on the number of rdma_rw contexts that can
> be created per QP.
>
> This invisible upper bound is because rdma_create_qp() adds one or
> more additional SQEs for each ctxt that the ULP requests via
> qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
> the order of the sum of qp_attr.cap.max_send_wr and a factor times
> qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
> on whether MR operations are required before RDMA Reads.
>
> This limit is not visible to RDMA consumers via dev->attrs. When the
> limit is surpassed, QP creation fails with -ENOMEM. For example:
>
> svcrdma's estimate of the number of rdma_rw contexts it needs is
> three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES
> is about 260, the internally-computed SQ length should be:
>
> 64 credits + 10 backlog + 3 * (3 * 260) = 2414
>
> Which is well below the advertised qp_max_wr of 32768.
>
> If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages:
>
> 64 credits + 10 backlog + 3 * (3 * 1040) = 9434
>
> However, QP creation fails. Dynamic printk for mlx5 shows:
>
> calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768)
>
> Although 9326 is still far below qp_max_wr, QP creation still
> fails.
>
> Because the total SQ length calculation is opaque to RDMA consumers,
> there doesn't seem to be much that can be done about this except for
> consumers to try to keep the requested rdma_rw ctxt count low.
>
> Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count")
> Reviewed-by: NeilBrown <neil@brown.name>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 5940a56023d1..3d7f1413df02 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -406,12 +406,12 @@ static void svc_rdma_xprt_done(struct rpcrdma_notification *rn)
> */
> static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> {
> + unsigned int ctxts, rq_depth, maxpayload;
> struct svcxprt_rdma *listen_rdma;
> struct svcxprt_rdma *newxprt = NULL;
> struct rdma_conn_param conn_param;
> struct rpcrdma_connect_private pmsg;
> struct ib_qp_init_attr qp_attr;
> - unsigned int ctxts, rq_depth;
> struct ib_device *dev;
> int ret = 0;
> RPC_IFDEBUG(struct sockaddr *sap);
> @@ -462,12 +462,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> newxprt->sc_max_bc_requests = 2;
> }
>
> - /* Arbitrarily estimate the number of rw_ctxs needed for
> - * this transport. This is enough rw_ctxs to make forward
> - * progress even if the client is using one rkey per page
> - * in each Read chunk.
> + /* Arbitrary estimate of the needed number of rdma_rw contexts.
> */
> - ctxts = 3 * RPCSVC_MAXPAGES;
> + maxpayload = min(xprt->xpt_server->sv_max_payload,
> + RPCSVC_MAXPAYLOAD_RDMA);
> + ctxts = newxprt->sc_max_requests * 3 *
> + rdma_rw_mr_factor(dev, newxprt->sc_port_num,
> + maxpayload >> PAGE_SHIFT);
> +
> newxprt->sc_sq_depth = rq_depth + ctxts;
> if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
> newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
Reviewed-by: Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-05-13 12:00 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 19:03 [PATCH v5 00/19] Allocate payload arrays dynamically cel
2025-05-09 19:03 ` [PATCH v5 01/19] svcrdma: Reduce the number of rdma_rw contexts per-QP cel
2025-05-13 12:00 ` Jeff Layton [this message]
2025-05-09 19:03 ` [PATCH v5 02/19] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-05-09 19:03 ` [PATCH v5 03/19] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-05-09 19:03 ` [PATCH v5 04/19] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-05-09 19:03 ` [PATCH v5 05/19] sunrpc: Replace the rq_bvec " cel
2025-05-09 19:03 ` [PATCH v5 06/19] NFSD: Use rqstp->rq_bvec in nfsd_iter_read() cel
2025-05-09 19:03 ` [PATCH v5 07/19] NFSD: De-duplicate the svc_fill_write_vector() call sites cel
2025-05-13 12:02 ` Jeff Layton
2025-05-09 19:03 ` [PATCH v5 08/19] SUNRPC: Export xdr_buf_to_bvec() cel
2025-05-09 19:03 ` [PATCH v5 09/19] NFSD: Use rqstp->rq_bvec in nfsd_iter_write() cel
2025-05-09 19:03 ` [PATCH v5 10/19] SUNRPC: Remove svc_fill_write_vector() cel
2025-05-09 19:03 ` [PATCH v5 11/19] SUNRPC: Remove svc_rqst :: rq_vec cel
2025-05-09 19:03 ` [PATCH v5 12/19] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-05-09 19:03 ` [PATCH v5 13/19] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-05-09 19:03 ` [PATCH v5 14/19] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-05-09 19:03 ` [PATCH v5 15/19] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-05-09 19:03 ` [PATCH v5 16/19] NFSD: Remove NFSD_BUFSIZE cel
2025-05-09 19:03 ` [PATCH v5 17/19] NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro cel
2025-05-09 19:03 ` [PATCH v5 18/19] NFSD: Add a "default" block size cel
2025-05-09 19:03 ` [PATCH v5 19/19] SUNRPC: Bump the maximum payload size for the server cel
2025-05-12 16:44 ` Aurélien Couderc
2025-05-12 18:09 ` Chuck Lever
2025-05-13 8:42 ` Aurélien Couderc
2025-05-13 12:08 ` Chuck Lever
2025-05-14 0:11 ` NeilBrown
2025-05-13 12:05 ` [PATCH v5 00/19] Allocate payload arrays dynamically Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f3d7a799ee28c821db1fb946640c270a89690174.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=hch@lst.de \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox