* [RFC PATCH V2 0/2] *** use rdma device capability to limit queue size *** @ 2023-12-19 7:32 Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 1/2] nvmet: rdma: utilize ib_device capability for setting max_queue_size Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize Guixin Liu 0 siblings, 2 replies; 11+ messages in thread From: Guixin Liu @ 2023-12-19 7:32 UTC (permalink / raw) To: hch, sagi, kbusch, kch, axboe; +Cc: linux-nvme Hi guys, Currently, the queue size if nvme over rdma is limited with a depth of 128, we can use rdma device capability to limit it. Changes from v1 to v2: - use nvmet_ctrl's port to find rdma's nvmet_rdma_queue without change get_max_queue_size's param. - additional clarification that the queue depth will be controlled within 1024, rather than being very large. Guixin Liu (2): nvmet: rdma: utilize ib_device capability for setting max_queue_size nvme: rdma: use ib_device's max_qp_wr to limit sqsize drivers/nvme/host/rdma.c | 14 ++++++++------ drivers/nvme/target/rdma.c | 17 ++++++++++++++++- include/linux/nvme-rdma.h | 2 +- 3 files changed, 25 insertions(+), 8 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH V2 1/2] nvmet: rdma: utilize ib_device capability for setting max_queue_size 2023-12-19 7:32 [RFC PATCH V2 0/2] *** use rdma device capability to limit queue size *** Guixin Liu @ 2023-12-19 7:32 ` Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize Guixin Liu 1 sibling, 0 replies; 11+ messages in thread From: Guixin Liu @ 2023-12-19 7:32 UTC (permalink / raw) To: hch, sagi, kbusch, kch, axboe; +Cc: linux-nvme Respond with the smaller value between 1024 and the ib_device's max_qp_wr as the RDMA max queue size. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> --- drivers/nvme/target/rdma.c | 17 ++++++++++++++++- include/linux/nvme-rdma.h | 2 ++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 4597bca..d546203 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -2002,7 +2002,22 @@ static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl) static u16 nvmet_rdma_get_max_queue_size(const struct nvmet_ctrl *ctrl) { - return NVME_RDMA_MAX_QUEUE_SIZE; + struct nvmet_rdma_queue *queue; + struct nvmet_port *nport = ctrl->port; + int max_qp_wr = 0; + + /* Here nvmet_ctrl's port is already set. */ + mutex_lock(&nvmet_rdma_queue_mutex); + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { + if (queue->port == nport) { + max_qp_wr = queue->dev->device->attrs.max_qp_wr; + break; + } + } + mutex_unlock(&nvmet_rdma_queue_mutex); + + return (u16)min_t(int, NVMET_QUEUE_SIZE, + max_qp_wr / (NVME_RDMA_SEND_WR_FACTOR + 1)); } static const struct nvmet_fabrics_ops nvmet_rdma_ops = { diff --git a/include/linux/nvme-rdma.h b/include/linux/nvme-rdma.h index 4dd7e6f..c19858b 100644 --- a/include/linux/nvme-rdma.h +++ b/include/linux/nvme-rdma.h @@ -8,6 +8,8 @@ #define NVME_RDMA_MAX_QUEUE_SIZE 128 +#define NVME_RDMA_SEND_WR_FACTOR 3 /* MR, SEND, INV */ + enum nvme_rdma_cm_fmt { NVME_RDMA_CM_FMT_1_0 = 0x0, }; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-19 7:32 [RFC PATCH V2 0/2] *** use rdma device capability to limit queue size *** Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 1/2] nvmet: rdma: utilize ib_device capability for setting max_queue_size Guixin Liu @ 2023-12-19 7:32 ` Guixin Liu 2023-12-20 9:17 ` Sagi Grimberg 1 sibling, 1 reply; 11+ messages in thread From: Guixin Liu @ 2023-12-19 7:32 UTC (permalink / raw) To: hch, sagi, kbusch, kch, axboe; +Cc: linux-nvme Currently, the host is limited to creating queues with a depth of 128. To enable larger queue sizes, constrain the sqsize based on the ib_device's max_qp_wr capability. In addition, the queue size is restricted between 16 and 1024 in nvmf_parse_options(), so the final value of queue depth will not biger than 1024. And also remove unused NVME_RDMA_MAX_QUEUE_SIZE macro. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> --- drivers/nvme/host/rdma.c | 14 ++++++++------ include/linux/nvme-rdma.h | 2 -- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 81e2621..982f3e4 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -489,8 +489,7 @@ static int nvme_rdma_create_cq(struct ib_device *ibdev, static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) { struct ib_device *ibdev; - const int send_wr_factor = 3; /* MR, SEND, INV */ - const int cq_factor = send_wr_factor + 1; /* + RECV */ + const int cq_factor = NVME_RDMA_SEND_WR_FACTOR + 1; /* + RECV */ int ret, pages_per_mr; queue->device = nvme_rdma_find_get_device(queue->cm_id); @@ -508,7 +507,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) if (ret) goto out_put_dev; - ret = nvme_rdma_create_qp(queue, send_wr_factor); + ret = nvme_rdma_create_qp(queue, NVME_RDMA_SEND_WR_FACTOR); if (ret) goto out_destroy_ib_cq; @@ -1006,6 +1005,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) { int ret; bool changed; + int ib_max_qsize; ret = nvme_rdma_configure_admin_queue(ctrl, new); if (ret) @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); } - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / + (NVME_RDMA_SEND_WR_FACTOR + 1); + if (ctrl->ctrl.sqsize + 1 > ib_max_qsize) { dev_warn(ctrl->ctrl.device, "ctrl sqsize %u > max queue size %u, clamping down\n", - ctrl->ctrl.sqsize + 1, NVME_RDMA_MAX_QUEUE_SIZE); - ctrl->ctrl.sqsize = NVME_RDMA_MAX_QUEUE_SIZE - 1; + ctrl->ctrl.sqsize + 1, ib_max_qsize); + ctrl->ctrl.sqsize = ib_max_qsize - 1; } if (ctrl->ctrl.sqsize + 1 > ctrl->ctrl.maxcmd) { diff --git a/include/linux/nvme-rdma.h b/include/linux/nvme-rdma.h index c19858b..67ee770 100644 --- a/include/linux/nvme-rdma.h +++ b/include/linux/nvme-rdma.h @@ -6,8 +6,6 @@ #ifndef _LINUX_NVME_RDMA_H #define _LINUX_NVME_RDMA_H -#define NVME_RDMA_MAX_QUEUE_SIZE 128 - #define NVME_RDMA_SEND_WR_FACTOR 3 /* MR, SEND, INV */ enum nvme_rdma_cm_fmt { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-19 7:32 ` [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize Guixin Liu @ 2023-12-20 9:17 ` Sagi Grimberg 2023-12-20 10:52 ` Max Gurtovoy 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2023-12-20 9:17 UTC (permalink / raw) To: Guixin Liu, hch, kbusch, kch, axboe; +Cc: linux-nvme On 12/19/23 09:32, Guixin Liu wrote: > Currently, the host is limited to creating queues with a depth of > 128. To enable larger queue sizes, constrain the sqsize based on > the ib_device's max_qp_wr capability. > In addition, the queue size is restricted between 16 and 1024 in > nvmf_parse_options(), so the final value of queue depth will not > biger than 1024. > > And also remove unused NVME_RDMA_MAX_QUEUE_SIZE macro. > > Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> > --- > drivers/nvme/host/rdma.c | 14 ++++++++------ > include/linux/nvme-rdma.h | 2 -- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index 81e2621..982f3e4 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -489,8 +489,7 @@ static int nvme_rdma_create_cq(struct ib_device *ibdev, > static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) > { > struct ib_device *ibdev; > - const int send_wr_factor = 3; /* MR, SEND, INV */ > - const int cq_factor = send_wr_factor + 1; /* + RECV */ > + const int cq_factor = NVME_RDMA_SEND_WR_FACTOR + 1; /* + RECV */ > int ret, pages_per_mr; > > queue->device = nvme_rdma_find_get_device(queue->cm_id); > @@ -508,7 +507,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) > if (ret) > goto out_put_dev; > > - ret = nvme_rdma_create_qp(queue, send_wr_factor); > + ret = nvme_rdma_create_qp(queue, NVME_RDMA_SEND_WR_FACTOR); > if (ret) > goto out_destroy_ib_cq; > > @@ -1006,6 +1005,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) > { > int ret; > bool changed; > + int ib_max_qsize; > > ret = nvme_rdma_configure_admin_queue(ctrl, new); > if (ret) > @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) > ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); > } > > - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { > + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / > + (NVME_RDMA_SEND_WR_FACTOR + 1); rdma_dev_max_qsize is a better name. Also, you can drop the RFC for the next submission. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-20 9:17 ` Sagi Grimberg @ 2023-12-20 10:52 ` Max Gurtovoy 2023-12-20 19:27 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Max Gurtovoy @ 2023-12-20 10:52 UTC (permalink / raw) To: Sagi Grimberg, Guixin Liu, hch, kbusch, kch, axboe; +Cc: linux-nvme Hi Guixin Liu, On 20/12/2023 11:17, Sagi Grimberg wrote: > > > On 12/19/23 09:32, Guixin Liu wrote: >> Currently, the host is limited to creating queues with a depth of >> 128. To enable larger queue sizes, constrain the sqsize based on >> the ib_device's max_qp_wr capability. >> In addition, the queue size is restricted between 16 and 1024 in >> nvmf_parse_options(), so the final value of queue depth will not >> biger than 1024. can you please explain why this patch series is needed ? what is the added value you see with it ? >> >> And also remove unused NVME_RDMA_MAX_QUEUE_SIZE macro. >> >> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> >> --- >> drivers/nvme/host/rdma.c | 14 ++++++++------ >> include/linux/nvme-rdma.h | 2 -- >> 2 files changed, 8 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >> index 81e2621..982f3e4 100644 >> --- a/drivers/nvme/host/rdma.c >> +++ b/drivers/nvme/host/rdma.c >> @@ -489,8 +489,7 @@ static int nvme_rdma_create_cq(struct ib_device >> *ibdev, >> static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) >> { >> struct ib_device *ibdev; >> - const int send_wr_factor = 3; /* MR, SEND, INV */ >> - const int cq_factor = send_wr_factor + 1; /* + RECV */ >> + const int cq_factor = NVME_RDMA_SEND_WR_FACTOR + 1; /* + RECV */ >> int ret, pages_per_mr; >> queue->device = nvme_rdma_find_get_device(queue->cm_id); >> @@ -508,7 +507,7 @@ static int nvme_rdma_create_queue_ib(struct >> nvme_rdma_queue *queue) >> if (ret) >> goto out_put_dev; >> - ret = nvme_rdma_create_qp(queue, send_wr_factor); >> + ret = nvme_rdma_create_qp(queue, NVME_RDMA_SEND_WR_FACTOR); >> if (ret) >> goto out_destroy_ib_cq; >> @@ -1006,6 +1005,7 @@ static int nvme_rdma_setup_ctrl(struct >> nvme_rdma_ctrl *ctrl, bool new) >> { >> int ret; >> bool changed; >> + int ib_max_qsize; >> ret = nvme_rdma_configure_admin_queue(ctrl, new); >> if (ret) >> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >> nvme_rdma_ctrl *ctrl, bool new) >> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >> } >> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >> + (NVME_RDMA_SEND_WR_FACTOR + 1); > > rdma_dev_max_qsize is a better name. > > Also, you can drop the RFC for the next submission. > Sagi, I don't feel comfortable with these patches. First I would like to understand the need for it. Second, the QP WR can be constructed from one or more WQEs and the WQEs can be constructed from one or more WQEBBs. The max_qp_wr doesn't take it into account. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-20 10:52 ` Max Gurtovoy @ 2023-12-20 19:27 ` Sagi Grimberg 2023-12-22 6:58 ` Guixin Liu 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2023-12-20 19:27 UTC (permalink / raw) To: Max Gurtovoy, Guixin Liu, hch, kbusch, kch, axboe; +Cc: linux-nvme >>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>> nvme_rdma_ctrl *ctrl, bool new) >>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>> } >>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >> >> rdma_dev_max_qsize is a better name. >> >> Also, you can drop the RFC for the next submission. >> > > Sagi, > I don't feel comfortable with these patches. Well, good that you're speaking up then ;) > First I would like to understand the need for it. I assumed that he stumbled on a device that did not support the existing max of 128 nvme commands (which is 384 rdma wrs for the qp). > Second, the QP WR can be constructed from one or more WQEs and the WQEs > can be constructed from one or more WQEBBs. The max_qp_wr doesn't take > it into account. Well, it is not taken into account now either with the existing magic limit in nvmet. The rdma limits reporting mechanism was and still is unusable. I would expect a device that has different size for different work items to report max_qp_wr accounting for the largest work element that the device supports, so it is universally correct. The fact that max_qp_wr means the maximum number of slots is a qp and at the same time different work requests can arbitrarily use any number of slots without anyone ever knowing, makes it pretty much impossible to use reliably. Maybe rdma device attributes need a new attribute called universal_max_qp_wr that is going to actually be reliable and not guess-work? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-20 19:27 ` Sagi Grimberg @ 2023-12-22 6:58 ` Guixin Liu 2023-12-24 1:37 ` Max Gurtovoy 0 siblings, 1 reply; 11+ messages in thread From: Guixin Liu @ 2023-12-22 6:58 UTC (permalink / raw) To: Sagi Grimberg, Max Gurtovoy, hch, kbusch, kch, axboe; +Cc: linux-nvme 在 2023/12/21 03:27, Sagi Grimberg 写道: > >>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>>> nvme_rdma_ctrl *ctrl, bool new) >>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>>> } >>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >>> >>> rdma_dev_max_qsize is a better name. >>> >>> Also, you can drop the RFC for the next submission. >>> >> >> Sagi, >> I don't feel comfortable with these patches. > > Well, good that you're speaking up then ;) > >> First I would like to understand the need for it. > > I assumed that he stumbled on a device that did not support the > existing max of 128 nvme commands (which is 384 rdma wrs for the qp). > The situation is that I need a queue depth greater than 128. >> Second, the QP WR can be constructed from one or more WQEs and the >> WQEs can be constructed from one or more WQEBBs. The max_qp_wr >> doesn't take it into account. > > Well, it is not taken into account now either with the existing magic > limit in nvmet. The rdma limits reporting mechanism was and still is > unusable. > > I would expect a device that has different size for different work > items to report max_qp_wr accounting for the largest work element that > the device supports, so it is universally correct. > > The fact that max_qp_wr means the maximum number of slots is a qp and > at the same time different work requests can arbitrarily use any number > of slots without anyone ever knowing, makes it pretty much impossible to > use reliably. > > Maybe rdma device attributes need a new attribute called > universal_max_qp_wr that is going to actually be reliable and not > guess-work? I see, the max_qp_wr is not as reliable as I imagined. Is there any another way to get a queue depth grater than 128 instead of changing NVME_RDMA_MAX_QUEUE_SIZE? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-22 6:58 ` Guixin Liu @ 2023-12-24 1:37 ` Max Gurtovoy 2023-12-25 8:40 ` Guixin Liu 0 siblings, 1 reply; 11+ messages in thread From: Max Gurtovoy @ 2023-12-24 1:37 UTC (permalink / raw) To: Guixin Liu, Sagi Grimberg, hch, kbusch, kch, axboe; +Cc: linux-nvme On 22/12/2023 8:58, Guixin Liu wrote: > > 在 2023/12/21 03:27, Sagi Grimberg 写道: >> >>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>>>> nvme_rdma_ctrl *ctrl, bool new) >>>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>>>> } >>>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>>>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >>>> >>>> rdma_dev_max_qsize is a better name. >>>> >>>> Also, you can drop the RFC for the next submission. >>>> >>> >>> Sagi, >>> I don't feel comfortable with these patches. >> >> Well, good that you're speaking up then ;) >> >>> First I would like to understand the need for it. >> >> I assumed that he stumbled on a device that did not support the >> existing max of 128 nvme commands (which is 384 rdma wrs for the qp). >> > The situation is that I need a queue depth greater than 128. >>> Second, the QP WR can be constructed from one or more WQEs and the >>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr >>> doesn't take it into account. >> >> Well, it is not taken into account now either with the existing magic >> limit in nvmet. The rdma limits reporting mechanism was and still is >> unusable. >> >> I would expect a device that has different size for different work >> items to report max_qp_wr accounting for the largest work element that >> the device supports, so it is universally correct. >> >> The fact that max_qp_wr means the maximum number of slots is a qp and >> at the same time different work requests can arbitrarily use any number >> of slots without anyone ever knowing, makes it pretty much impossible to >> use reliably. >> >> Maybe rdma device attributes need a new attribute called >> universal_max_qp_wr that is going to actually be reliable and not >> guess-work? > > I see, the max_qp_wr is not as reliable as I imagined. Is there any > another way to get a queue depth grater than 128 > > instead of changing NVME_RDMA_MAX_QUEUE_SIZE? > When I added this limit to RDMA transports it was to avoid a situation that a QP will fail to be created if one will ask a large queue. I choose 128 since it was supported for all the RDMA adapters I've tested in my lab (mostly Mellanox adapters). For this queue depth we found that the performance is good enough and it will not be improved if we will increase the depth. Are you saying that you have a device that can provide better performance with qdepth > 128 ? What is the tested qdepth and what are the numbers you see with this qdepth ? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-24 1:37 ` Max Gurtovoy @ 2023-12-25 8:40 ` Guixin Liu 2023-12-25 8:59 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Guixin Liu @ 2023-12-25 8:40 UTC (permalink / raw) To: Max Gurtovoy, Sagi Grimberg, hch, kbusch, kch, axboe; +Cc: linux-nvme 在 2023/12/24 09:37, Max Gurtovoy 写道: > > > On 22/12/2023 8:58, Guixin Liu wrote: >> >> 在 2023/12/21 03:27, Sagi Grimberg 写道: >>> >>>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>>>>> nvme_rdma_ctrl *ctrl, bool new) >>>>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>>>>> } >>>>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>>>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>>>>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >>>>> >>>>> rdma_dev_max_qsize is a better name. >>>>> >>>>> Also, you can drop the RFC for the next submission. >>>>> >>>> >>>> Sagi, >>>> I don't feel comfortable with these patches. >>> >>> Well, good that you're speaking up then ;) >>> >>>> First I would like to understand the need for it. >>> >>> I assumed that he stumbled on a device that did not support the >>> existing max of 128 nvme commands (which is 384 rdma wrs for the qp). >>> >> The situation is that I need a queue depth greater than 128. >>>> Second, the QP WR can be constructed from one or more WQEs and the >>>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr >>>> doesn't take it into account. >>> >>> Well, it is not taken into account now either with the existing magic >>> limit in nvmet. The rdma limits reporting mechanism was and still is >>> unusable. >>> >>> I would expect a device that has different size for different work >>> items to report max_qp_wr accounting for the largest work element that >>> the device supports, so it is universally correct. >>> >>> The fact that max_qp_wr means the maximum number of slots is a qp and >>> at the same time different work requests can arbitrarily use any number >>> of slots without anyone ever knowing, makes it pretty much >>> impossible to >>> use reliably. >>> >>> Maybe rdma device attributes need a new attribute called >>> universal_max_qp_wr that is going to actually be reliable and not >>> guess-work? >> >> I see, the max_qp_wr is not as reliable as I imagined. Is there any >> another way to get a queue depth grater than 128 >> >> instead of changing NVME_RDMA_MAX_QUEUE_SIZE? >> > > When I added this limit to RDMA transports it was to avoid a situation > that a QP will fail to be created if one will ask a large queue. > > I choose 128 since it was supported for all the RDMA adapters I've > tested in my lab (mostly Mellanox adapters). > For this queue depth we found that the performance is good enough and > it will not be improved if we will increase the depth. > > Are you saying that you have a device that can provide better > performance with qdepth > 128 ? > What is the tested qdepth and what are the numbers you see with this > qdepth ? Yeah, you are right, the improvement is small(about %1~2%), I do this only for better benchmark, I still consist that using the capabilities of RDMA device to determine the size of queue is a better choice, but now I change the NVME_RDMA_MAX_QUEUE_SIZE to 256 for bidding. Best regards, Guixin Liu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-25 8:40 ` Guixin Liu @ 2023-12-25 8:59 ` Sagi Grimberg 2023-12-25 12:36 ` Max Gurtovoy 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2023-12-25 8:59 UTC (permalink / raw) To: Guixin Liu, Max Gurtovoy, hch, kbusch, kch, axboe; +Cc: linux-nvme >>>>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>>>>>> nvme_rdma_ctrl *ctrl, bool new) >>>>>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>>>>>> } >>>>>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>>>>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>>>>>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >>>>>> >>>>>> rdma_dev_max_qsize is a better name. >>>>>> >>>>>> Also, you can drop the RFC for the next submission. >>>>>> >>>>> >>>>> Sagi, >>>>> I don't feel comfortable with these patches. >>>> >>>> Well, good that you're speaking up then ;) >>>> >>>>> First I would like to understand the need for it. >>>> >>>> I assumed that he stumbled on a device that did not support the >>>> existing max of 128 nvme commands (which is 384 rdma wrs for the qp). >>>> >>> The situation is that I need a queue depth greater than 128. >>>>> Second, the QP WR can be constructed from one or more WQEs and the >>>>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr >>>>> doesn't take it into account. >>>> >>>> Well, it is not taken into account now either with the existing magic >>>> limit in nvmet. The rdma limits reporting mechanism was and still is >>>> unusable. >>>> >>>> I would expect a device that has different size for different work >>>> items to report max_qp_wr accounting for the largest work element that >>>> the device supports, so it is universally correct. >>>> >>>> The fact that max_qp_wr means the maximum number of slots is a qp and >>>> at the same time different work requests can arbitrarily use any number >>>> of slots without anyone ever knowing, makes it pretty much >>>> impossible to >>>> use reliably. >>>> >>>> Maybe rdma device attributes need a new attribute called >>>> universal_max_qp_wr that is going to actually be reliable and not >>>> guess-work? >>> >>> I see, the max_qp_wr is not as reliable as I imagined. Is there any >>> another way to get a queue depth grater than 128 >>> >>> instead of changing NVME_RDMA_MAX_QUEUE_SIZE? >>> >> >> When I added this limit to RDMA transports it was to avoid a situation >> that a QP will fail to be created if one will ask a large queue. >> >> I choose 128 since it was supported for all the RDMA adapters I've >> tested in my lab (mostly Mellanox adapters). >> For this queue depth we found that the performance is good enough and >> it will not be improved if we will increase the depth. >> >> Are you saying that you have a device that can provide better >> performance with qdepth > 128 ? >> What is the tested qdepth and what are the numbers you see with this >> qdepth ? > > Yeah, you are right, the improvement is small(about %1~2%), I do this > only for better benchmark, Well, it doesn't come for free, you are essentially doubling the queue depth. I'm also assuming that you tested a single initiator and a single queue? > I still consist that using the capabilities of RDMA device to determine > the size of queue is a better choice, but now I change the > > NVME_RDMA_MAX_QUEUE_SIZE to 256 for bidding. Still doesn't change the fact that its a pure guess-work if it is supported by the device or not. Are you even able to create that queue depth with DIF workloads? Max, what is the maximum effective depth with DIF enabled? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize 2023-12-25 8:59 ` Sagi Grimberg @ 2023-12-25 12:36 ` Max Gurtovoy 0 siblings, 0 replies; 11+ messages in thread From: Max Gurtovoy @ 2023-12-25 12:36 UTC (permalink / raw) To: Sagi Grimberg, Guixin Liu, hch, kbusch, kch, axboe; +Cc: linux-nvme On 25/12/2023 10:59, Sagi Grimberg wrote: > >>>>>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct >>>>>>>> nvme_rdma_ctrl *ctrl, bool new) >>>>>>>> ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1); >>>>>>>> } >>>>>>>> - if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) { >>>>>>>> + ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr / >>>>>>>> + (NVME_RDMA_SEND_WR_FACTOR + 1); >>>>>>> >>>>>>> rdma_dev_max_qsize is a better name. >>>>>>> >>>>>>> Also, you can drop the RFC for the next submission. >>>>>>> >>>>>> >>>>>> Sagi, >>>>>> I don't feel comfortable with these patches. >>>>> >>>>> Well, good that you're speaking up then ;) >>>>> >>>>>> First I would like to understand the need for it. >>>>> >>>>> I assumed that he stumbled on a device that did not support the >>>>> existing max of 128 nvme commands (which is 384 rdma wrs for the qp). >>>>> >>>> The situation is that I need a queue depth greater than 128. >>>>>> Second, the QP WR can be constructed from one or more WQEs and the >>>>>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr >>>>>> doesn't take it into account. >>>>> >>>>> Well, it is not taken into account now either with the existing magic >>>>> limit in nvmet. The rdma limits reporting mechanism was and still is >>>>> unusable. >>>>> >>>>> I would expect a device that has different size for different work >>>>> items to report max_qp_wr accounting for the largest work element that >>>>> the device supports, so it is universally correct. >>>>> >>>>> The fact that max_qp_wr means the maximum number of slots is a qp and >>>>> at the same time different work requests can arbitrarily use any >>>>> number >>>>> of slots without anyone ever knowing, makes it pretty much >>>>> impossible to >>>>> use reliably. >>>>> >>>>> Maybe rdma device attributes need a new attribute called >>>>> universal_max_qp_wr that is going to actually be reliable and not >>>>> guess-work? >>>> >>>> I see, the max_qp_wr is not as reliable as I imagined. Is there any >>>> another way to get a queue depth grater than 128 >>>> >>>> instead of changing NVME_RDMA_MAX_QUEUE_SIZE? >>>> >>> >>> When I added this limit to RDMA transports it was to avoid a >>> situation that a QP will fail to be created if one will ask a large >>> queue. >>> >>> I choose 128 since it was supported for all the RDMA adapters I've >>> tested in my lab (mostly Mellanox adapters). >>> For this queue depth we found that the performance is good enough and >>> it will not be improved if we will increase the depth. >>> >>> Are you saying that you have a device that can provide better >>> performance with qdepth > 128 ? >>> What is the tested qdepth and what are the numbers you see with this >>> qdepth ? >> >> Yeah, you are right, the improvement is small(about %1~2%), I do this >> only for better benchmark, > > Well, it doesn't come for free, you are essentially doubling the queue > depth. I'm also assuming that you tested a single initiator and a > single queue? > >> I still consist that using the capabilities of RDMA device to >> determine the size of queue is a better choice, but now I change the >> >> NVME_RDMA_MAX_QUEUE_SIZE to 256 for bidding. > > Still doesn't change the fact that its a pure guess-work if it is > supported by the device or not. > > Are you even able to create that queue depth with DIF workloads? > > Max, what is the maximum effective depth with DIF enabled? I'll need to check it. I'll prepare some patches to allow RDMA queue_size to be 256 for non-pi controllers anyway. I also would like to add another configfs entry to determine the max queue size of a target port. hope to merge it for upcoming merge window. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-25 12:36 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-19 7:32 [RFC PATCH V2 0/2] *** use rdma device capability to limit queue size *** Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 1/2] nvmet: rdma: utilize ib_device capability for setting max_queue_size Guixin Liu 2023-12-19 7:32 ` [RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize Guixin Liu 2023-12-20 9:17 ` Sagi Grimberg 2023-12-20 10:52 ` Max Gurtovoy 2023-12-20 19:27 ` Sagi Grimberg 2023-12-22 6:58 ` Guixin Liu 2023-12-24 1:37 ` Max Gurtovoy 2023-12-25 8:40 ` Guixin Liu 2023-12-25 8:59 ` Sagi Grimberg 2023-12-25 12:36 ` Max Gurtovoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox