From: Guixin Liu <kanie@linux.alibaba.com>
To: Sagi Grimberg <sagi@grimberg.me>,
Max Gurtovoy <mgurtovoy@nvidia.com>,
hch@lst.de, kbusch@kernel.org, kch@nvidia.com, axboe@kernel.dk
Cc: linux-nvme@lists.infradead.org
Subject: Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize
Date: Fri, 22 Dec 2023 14:58:58 +0800 [thread overview]
Message-ID: <436efebd-ab7e-4b23-9be0-a316884552ca@linux.alibaba.com> (raw)
In-Reply-To: <77df6829-3a14-49a1-82e5-f3389ba47d86@grimberg.me>
在 2023/12/21 03:27, Sagi Grimberg 写道:
>
>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct
>>>> nvme_rdma_ctrl *ctrl, bool new)
>>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1);
>>>> }
>>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) {
>>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr /
>>>> + (NVME_RDMA_SEND_WR_FACTOR + 1);
>>>
>>> rdma_dev_max_qsize is a better name.
>>>
>>> Also, you can drop the RFC for the next submission.
>>>
>>
>> Sagi,
>> I don't feel comfortable with these patches.
>
> Well, good that you're speaking up then ;)
>
>> First I would like to understand the need for it.
>
> I assumed that he stumbled on a device that did not support the
> existing max of 128 nvme commands (which is 384 rdma wrs for the qp).
>
The situation is that I need a queue depth greater than 128.
>> Second, the QP WR can be constructed from one or more WQEs and the
>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr
>> doesn't take it into account.
>
> Well, it is not taken into account now either with the existing magic
> limit in nvmet. The rdma limits reporting mechanism was and still is
> unusable.
>
> I would expect a device that has different size for different work
> items to report max_qp_wr accounting for the largest work element that
> the device supports, so it is universally correct.
>
> The fact that max_qp_wr means the maximum number of slots is a qp and
> at the same time different work requests can arbitrarily use any number
> of slots without anyone ever knowing, makes it pretty much impossible to
> use reliably.
>
> Maybe rdma device attributes need a new attribute called
> universal_max_qp_wr that is going to actually be reliable and not
> guess-work?
I see, the max_qp_wr is not as reliable as I imagined. Is there any
another way to get a queue depth grater than 128
instead of changing NVME_RDMA_MAX_QUEUE_SIZE?
next prev parent reply other threads:[~2023-12-22 6:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-19 7:32 [RFC PATCH V2 0/2] *** use rdma device capability to limit queue size *** Guixin Liu
2023-12-19 7:32 ` [RFC PATCH V2 1/2] nvmet: rdma: utilize ib_device capability for setting max_queue_size Guixin Liu
2023-12-19 7:32 ` [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize Guixin Liu
2023-12-20 9:17 ` Sagi Grimberg
2023-12-20 10:52 ` Max Gurtovoy
2023-12-20 19:27 ` Sagi Grimberg
2023-12-22 6:58 ` Guixin Liu [this message]
2023-12-24 1:37 ` Max Gurtovoy
2023-12-25 8:40 ` Guixin Liu
2023-12-25 8:59 ` Sagi Grimberg
2023-12-25 12:36 ` Max Gurtovoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=436efebd-ab7e-4b23-9be0-a316884552ca@linux.alibaba.com \
--to=kanie@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-nvme@lists.infradead.org \
--cc=mgurtovoy@nvidia.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox