All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Jason Gunthorpe <jgg@ziepe.ca>, Christoph Hellwig <hch@infradead.org>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	Anna Schumaker <anna@kernel.org>,
	linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
	Chuck Lever <chuck.lever@oracle.com>,
	Leon Romanovsky <leon@kernel.org>
Subject: Re: [PATCH v4 01/14] svcrdma: Reduce the number of rdma_rw contexts per-QP
Date: Tue, 6 May 2025 10:13:00 -0400	[thread overview]
Message-ID: <be740f28-8d68-400c-85bc-81cc4e48ccc6@kernel.org> (raw)
In-Reply-To: <20250506135536.GH2260621@ziepe.ca>

On 5/6/25 9:55 AM, Jason Gunthorpe wrote:
> On Tue, May 06, 2025 at 06:40:25AM -0700, Christoph Hellwig wrote:
>> On Tue, May 06, 2025 at 10:17:22AM -0300, Jason Gunthorpe wrote:
>>> On Tue, May 06, 2025 at 06:08:59AM -0700, Christoph Hellwig wrote:
>>>> On Mon, Apr 28, 2025 at 03:36:49PM -0400, cel@kernel.org wrote:
>>>>> qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
>>>>> the order of the sum of qp_attr.cap.max_send_wr and a factor times
>>>>> qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
>>>>> on whether MR operations are required before RDMA Reads.
>>>>>
>>>>> This limit is not visible to RDMA consumers via dev->attrs. When the
>>>>> limit is surpassed, QP creation fails with -ENOMEM. For example:
>>>>
>>>> Can we find a way to expose this limit from the HCA drivers and the
>>>> RDMA core?
>>>
>>> Shouldn't it be max_qp_wr?
>>
>> Does that allow for arbitrary combination of different WRs?  
> 
> I think it is supposed to be the maximum QP WR depth you can create..
> 
> A QP shouldn't behave differently depending on the WR operation, each
> one takes one WR entry.
> 
> Chuck do you know differently?

qp_attr.cap.max_rdma_ctxs reserves a number of SQEs over and above
qp_attr.cap.max_send_wr. The sum of those two cannot exceed max_qp_wr,
of course.

But there is a multiplier, due to whether the device wants a
registration and invalidation WR in addition to each RDMA Read WR.

Further, in drivers/infiniband/hw/mlx5/qp.c :: calc_sq_size

        wq_size = roundup_pow_of_two(attr->cap.max_send_wr * wqe_size);
        qp->sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB;
        if (qp->sq.wqe_cnt > (1 << MLX5_CAP_GEN(dev->mdev,
log_max_qp_sz))) {
                mlx5_ib_dbg(dev, "send queue size (%d * %d / %d -> %d)
exceeds limits(%d)\n",
                            attr->cap.max_send_wr, wqe_size,
MLX5_SEND_WQE_BB
                            qp->sq.wqe_cnt,

                            1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz));
                return -ENOMEM;
        }

So when svcrdma requests a large number of ctxts on top of a Send
Queue size of 135, svc_rdma_accept() fails and the debug message above
pops out.

In this patch I'm trying to include the reg/inv multiplier in the
calculation, but that doesn't seem to be enough to make "accept"
reliable, IMO due to this extra calculation in calc_sq_size().

-- 
Chuck Lever

  reply	other threads:[~2025-05-06 14:13 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28 19:36 [PATCH v4 00/14] Allocate payload arrays dynamically cel
2025-04-28 19:36 ` [PATCH v4 01/14] svcrdma: Reduce the number of rdma_rw contexts per-QP cel
2025-05-06 13:08   ` Christoph Hellwig
2025-05-06 13:17     ` Jason Gunthorpe
2025-05-06 13:40       ` Christoph Hellwig
2025-05-06 13:55         ` Jason Gunthorpe
2025-05-06 14:13           ` Chuck Lever [this message]
2025-05-06 14:17             ` Jason Gunthorpe
2025-05-06 14:19               ` Chuck Lever
2025-05-06 14:22                 ` Jason Gunthorpe
2025-05-08  8:41                   ` Edward Srouji
2025-05-08 12:43                     ` Jason Gunthorpe
2025-05-10 23:12                       ` Edward Srouji
2025-04-28 19:36 ` [PATCH v4 02/14] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-05-06 13:10   ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 03/14] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-05-06 13:11   ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 04/14] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-04-30  4:53   ` NeilBrown
2025-04-28 19:36 ` [PATCH v4 05/14] sunrpc: Replace the rq_vec " cel
2025-05-06 13:29   ` Christoph Hellwig
2025-05-06 16:31     ` Chuck Lever
2025-05-07  7:34       ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 06/14] sunrpc: Replace the rq_bvec " cel
2025-04-28 19:36 ` [PATCH v4 07/14] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-04-28 19:36 ` [PATCH v4 08/14] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-05-06 13:31   ` Christoph Hellwig
2025-05-06 15:20     ` Chuck Lever
2025-05-07  7:40       ` Christoph Hellwig
2025-04-28 19:36 ` [PATCH v4 09/14] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-04-28 19:36 ` [PATCH v4 10/14] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-04-28 19:36 ` [PATCH v4 11/14] NFSD: Remove NFSD_BUFSIZE cel
2025-04-28 21:03   ` Jeff Layton
2025-05-06 13:32   ` Christoph Hellwig
2025-04-28 19:37 ` [PATCH v4 12/14] NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro cel
2025-05-06 13:33   ` Christoph Hellwig
2025-04-28 19:37 ` [PATCH v4 13/14] NFSD: Add a "default" block size cel
2025-04-28 21:07   ` Jeff Layton
2025-04-28 19:37 ` [PATCH v4 14/14] SUNRPC: Bump the maximum payload size for the server cel
2025-04-28 21:08   ` Jeff Layton
2025-04-29 15:44     ` Chuck Lever
2025-05-06 13:34   ` Christoph Hellwig
2025-05-06 13:52     ` Chuck Lever
2025-05-06 13:54       ` Jeff Layton
2025-05-06 13:59         ` Chuck Lever
2025-05-07  7:42       ` Christoph Hellwig
2025-05-07 14:25         ` Chuck Lever
2025-04-29 13:06 ` [PATCH v4 00/14] Allocate payload arrays dynamically Zhu Yanjun
2025-04-29 13:41   ` Chuck Lever
2025-04-29 13:52     ` Zhu Yanjun
2025-04-30  5:11 ` NeilBrown
2025-04-30 12:45   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be740f28-8d68-400c-85bc-81cc4e48ccc6@kernel.org \
    --to=cel@kernel.org \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@infradead.org \
    --cc=jgg@ziepe.ca \
    --cc=jlayton@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.