public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: cel@kernel.org, NeilBrown <neil@brown.name>,
	Olga Kornievskaia	 <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
	Chuck Lever <chuck.lever@oracle.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v5 01/19] svcrdma: Reduce the number of rdma_rw contexts per-QP
Date: Tue, 13 May 2025 07:00:54 -0500	[thread overview]
Message-ID: <f3d7a799ee28c821db1fb946640c270a89690174.camel@kernel.org> (raw)
In-Reply-To: <20250509190354.5393-2-cel@kernel.org>

On Fri, 2025-05-09 at 15:03 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> There is an upper bound on the number of rdma_rw contexts that can
> be created per QP.
> 
> This invisible upper bound is because rdma_create_qp() adds one or
> more additional SQEs for each ctxt that the ULP requests via
> qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
> the order of the sum of qp_attr.cap.max_send_wr and a factor times
> qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
> on whether MR operations are required before RDMA Reads.
> 
> This limit is not visible to RDMA consumers via dev->attrs. When the
> limit is surpassed, QP creation fails with -ENOMEM. For example:
> 
> svcrdma's estimate of the number of rdma_rw contexts it needs is
> three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES
> is about 260, the internally-computed SQ length should be:
> 
> 64 credits + 10 backlog + 3 * (3 * 260) = 2414
> 
> Which is well below the advertised qp_max_wr of 32768.
> 
> If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages:
> 
> 64 credits + 10 backlog + 3 * (3 * 1040) = 9434
> 
> However, QP creation fails. Dynamic printk for mlx5 shows:
> 
> calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768)
> 
> Although 9326 is still far below qp_max_wr, QP creation still
> fails.
> 
> Because the total SQ length calculation is opaque to RDMA consumers,
> there doesn't seem to be much that can be done about this except for
> consumers to try to keep the requested rdma_rw ctxt count low.
> 
> Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count")
> Reviewed-by: NeilBrown <neil@brown.name>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 5940a56023d1..3d7f1413df02 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -406,12 +406,12 @@ static void svc_rdma_xprt_done(struct rpcrdma_notification *rn)
>   */
>  static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  {
> +	unsigned int ctxts, rq_depth, maxpayload;
>  	struct svcxprt_rdma *listen_rdma;
>  	struct svcxprt_rdma *newxprt = NULL;
>  	struct rdma_conn_param conn_param;
>  	struct rpcrdma_connect_private pmsg;
>  	struct ib_qp_init_attr qp_attr;
> -	unsigned int ctxts, rq_depth;
>  	struct ib_device *dev;
>  	int ret = 0;
>  	RPC_IFDEBUG(struct sockaddr *sap);
> @@ -462,12 +462,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  		newxprt->sc_max_bc_requests = 2;
>  	}
>  
> -	/* Arbitrarily estimate the number of rw_ctxs needed for
> -	 * this transport. This is enough rw_ctxs to make forward
> -	 * progress even if the client is using one rkey per page
> -	 * in each Read chunk.
> +	/* Arbitrary estimate of the needed number of rdma_rw contexts.
>  	 */
> -	ctxts = 3 * RPCSVC_MAXPAGES;
> +	maxpayload = min(xprt->xpt_server->sv_max_payload,
> +			 RPCSVC_MAXPAYLOAD_RDMA);
> +	ctxts = newxprt->sc_max_requests * 3 *
> +		rdma_rw_mr_factor(dev, newxprt->sc_port_num,
> +				  maxpayload >> PAGE_SHIFT);
> +
>  	newxprt->sc_sq_depth = rq_depth + ctxts;
>  	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
>  		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2025-05-13 12:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09 19:03 [PATCH v5 00/19] Allocate payload arrays dynamically cel
2025-05-09 19:03 ` [PATCH v5 01/19] svcrdma: Reduce the number of rdma_rw contexts per-QP cel
2025-05-13 12:00   ` Jeff Layton [this message]
2025-05-09 19:03 ` [PATCH v5 02/19] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-05-09 19:03 ` [PATCH v5 03/19] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-05-09 19:03 ` [PATCH v5 04/19] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-05-09 19:03 ` [PATCH v5 05/19] sunrpc: Replace the rq_bvec " cel
2025-05-09 19:03 ` [PATCH v5 06/19] NFSD: Use rqstp->rq_bvec in nfsd_iter_read() cel
2025-05-09 19:03 ` [PATCH v5 07/19] NFSD: De-duplicate the svc_fill_write_vector() call sites cel
2025-05-13 12:02   ` Jeff Layton
2025-05-09 19:03 ` [PATCH v5 08/19] SUNRPC: Export xdr_buf_to_bvec() cel
2025-05-09 19:03 ` [PATCH v5 09/19] NFSD: Use rqstp->rq_bvec in nfsd_iter_write() cel
2025-05-09 19:03 ` [PATCH v5 10/19] SUNRPC: Remove svc_fill_write_vector() cel
2025-05-09 19:03 ` [PATCH v5 11/19] SUNRPC: Remove svc_rqst :: rq_vec cel
2025-05-09 19:03 ` [PATCH v5 12/19] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-05-09 19:03 ` [PATCH v5 13/19] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-05-09 19:03 ` [PATCH v5 14/19] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-05-09 19:03 ` [PATCH v5 15/19] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-05-09 19:03 ` [PATCH v5 16/19] NFSD: Remove NFSD_BUFSIZE cel
2025-05-09 19:03 ` [PATCH v5 17/19] NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro cel
2025-05-09 19:03 ` [PATCH v5 18/19] NFSD: Add a "default" block size cel
2025-05-09 19:03 ` [PATCH v5 19/19] SUNRPC: Bump the maximum payload size for the server cel
2025-05-12 16:44   ` Aurélien Couderc
2025-05-12 18:09     ` Chuck Lever
2025-05-13  8:42       ` Aurélien Couderc
2025-05-13 12:08         ` Chuck Lever
2025-05-14  0:11         ` NeilBrown
2025-05-13 12:05 ` [PATCH v5 00/19] Allocate payload arrays dynamically Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3d7a799ee28c821db1fb946640c270a89690174.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox