* [PATCH 1/3] nvmet: Add mdts setting op for controllers
@ 2020-03-04 15:39 Max Gurtovoy
2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Max Gurtovoy @ 2020-03-04 15:39 UTC (permalink / raw)
To: jgg, linux-nvme, sagi, hch, kbusch, Chaitanya.Kulkarni
Cc: krishna2, Max Gurtovoy, bharat, nirranjan
Some transports, such as RDMA, would like to set the Maximum Data
Transfer Size (MDTS) according to device/port/ctrl characteristics.
This will enable the transport to set the optimal MDTS according to
controller needs and device capabilities. Add a new nvmet transport
op that is called during ctrl identification. This will not effect
transports that don't set this option.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
---
drivers/nvme/target/admin-cmd.c | 8 ++++++--
drivers/nvme/target/nvmet.h | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c0aa9c3..bbbb502 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -369,8 +369,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
/* we support multiple ports, multiples hosts and ANA: */
id->cmic = (1 << 0) | (1 << 1) | (1 << 3);
- /* no limit on data transfer sizes for now */
- id->mdts = 0;
+ /* Limit MDTS according to transport capability */
+ if (ctrl->ops->set_mdts)
+ id->mdts = ctrl->ops->set_mdts(ctrl);
+ else
+ id->mdts = 0;
+
id->cntlid = cpu_to_le16(ctrl->cntlid);
id->ver = cpu_to_le32(ctrl->subsys->ver);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 42ba2dd..1ae41fd 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -289,6 +289,7 @@ struct nvmet_fabrics_ops {
struct nvmet_port *port, char *traddr);
u16 (*install_queue)(struct nvmet_sq *nvme_sq);
void (*discovery_chg)(struct nvmet_port *port);
+ u8 (*set_mdts)(struct nvmet_ctrl *ctrl);
};
#define NVMET_MAX_INLINE_BIOVEC 8
--
1.8.3.1
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 15:39 [PATCH 1/3] nvmet: Add mdts setting op for controllers Max Gurtovoy @ 2020-03-04 15:39 ` Max Gurtovoy 2020-03-04 16:01 ` Christoph Hellwig 2020-03-04 16:32 ` Christoph Hellwig 2020-03-04 15:39 ` [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts Max Gurtovoy ` (2 subsequent siblings) 3 siblings, 2 replies; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 15:39 UTC (permalink / raw) To: jgg, linux-nvme, sagi, hch, kbusch, Chaitanya.Kulkarni Cc: krishna2, Max Gurtovoy, bharat, nirranjan Set the maximal data transfer size to be 1MB (currently mdts is unlimited). This will allow calculating the amount of MR's that one ctrl should allocate to fulfill it's capabilities. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> --- drivers/nvme/target/rdma.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 37d262a..5ba76d2 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -30,6 +30,7 @@ #define NVMET_RDMA_DEFAULT_INLINE_DATA_SIZE PAGE_SIZE #define NVMET_RDMA_MAX_INLINE_SGE 4 #define NVMET_RDMA_MAX_INLINE_DATA_SIZE max_t(int, SZ_16K, PAGE_SIZE) +#define NVMET_RDMA_MAX_MDTS 8 struct nvmet_rdma_cmd { struct ib_sge sge[NVMET_RDMA_MAX_INLINE_SGE + 1]; @@ -1602,6 +1603,12 @@ static void nvmet_rdma_disc_port_addr(struct nvmet_req *req, } } +static u8 nvmet_rdma_set_mdts(struct nvmet_ctrl *ctrl) +{ + /* Assume mpsmin == device_page_size == 4KB */ + return NVMET_RDMA_MAX_MDTS; +} + static const struct nvmet_fabrics_ops nvmet_rdma_ops = { .owner = THIS_MODULE, .type = NVMF_TRTYPE_RDMA, @@ -1612,6 +1619,7 @@ static void nvmet_rdma_disc_port_addr(struct nvmet_req *req, .queue_response = nvmet_rdma_queue_response, .delete_ctrl = nvmet_rdma_delete_ctrl, .disc_traddr = nvmet_rdma_disc_port_addr, + .set_mdts = nvmet_rdma_set_mdts, }; static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) -- 1.8.3.1 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy @ 2020-03-04 16:01 ` Christoph Hellwig 2020-03-04 16:15 ` Max Gurtovoy 2020-03-04 16:32 ` Christoph Hellwig 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:01 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, hch On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: > Set the maximal data transfer size to be 1MB (currently mdts is > unlimited). This will allow calculating the amount of MR's that > one ctrl should allocate to fulfill it's capabilities. Do we reall need an op for this vs just setting the field in nvme_rdma_create_ctrl? _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 16:01 ` Christoph Hellwig @ 2020-03-04 16:15 ` Max Gurtovoy 2020-03-04 16:18 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 16:15 UTC (permalink / raw) To: Christoph Hellwig Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch On 3/4/2020 6:01 PM, Christoph Hellwig wrote: > On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: >> Set the maximal data transfer size to be 1MB (currently mdts is >> unlimited). This will allow calculating the amount of MR's that >> one ctrl should allocate to fulfill it's capabilities. > Do we reall need an op for this vs just setting the field in > nvme_rdma_create_ctrl? This is target side :) But we can add it to .add_port and limit/set it thorough configfs as we do for inline_data_size (with default of 1MB). Let me know what's preferred. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 16:15 ` Max Gurtovoy @ 2020-03-04 16:18 ` Christoph Hellwig 2020-03-04 16:26 ` Max Gurtovoy 0 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:18 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, Christoph Hellwig On Wed, Mar 04, 2020 at 06:15:55PM +0200, Max Gurtovoy wrote: > > On 3/4/2020 6:01 PM, Christoph Hellwig wrote: >> On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: >>> Set the maximal data transfer size to be 1MB (currently mdts is >>> unlimited). This will allow calculating the amount of MR's that >>> one ctrl should allocate to fulfill it's capabilities. >> Do we reall need an op for this vs just setting the field in >> nvme_rdma_create_ctrl? > > This is target side :) > > But we can add it to .add_port and limit/set it thorough configfs as we do > for inline_data_size (with default of 1MB). > > Let me know what's preferred. I'm always in favor of avoiding indirect function calls if we can. But I don't think we need a configfs interface, just do add_port for now. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 16:18 ` Christoph Hellwig @ 2020-03-04 16:26 ` Max Gurtovoy 2020-03-04 16:30 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 16:26 UTC (permalink / raw) To: Christoph Hellwig Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch On 3/4/2020 6:18 PM, Christoph Hellwig wrote: > On Wed, Mar 04, 2020 at 06:15:55PM +0200, Max Gurtovoy wrote: >> On 3/4/2020 6:01 PM, Christoph Hellwig wrote: >>> On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: >>>> Set the maximal data transfer size to be 1MB (currently mdts is >>>> unlimited). This will allow calculating the amount of MR's that >>>> one ctrl should allocate to fulfill it's capabilities. >>> Do we reall need an op for this vs just setting the field in >>> nvme_rdma_create_ctrl? >> This is target side :) >> >> But we can add it to .add_port and limit/set it thorough configfs as we do >> for inline_data_size (with default of 1MB). >> >> Let me know what's preferred. > I'm always in favor of avoiding indirect function calls if we can. But > I don't think we need a configfs interface, just do add_port for now. wait, I forgot that this will not be compatible with adding MD/T10-PI (under review) since a port might have a pi-ctrl and non-pi-ctrl. So better to have it per controller as in this series.. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 16:26 ` Max Gurtovoy @ 2020-03-04 16:30 ` Christoph Hellwig 0 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:30 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, Christoph Hellwig On Wed, Mar 04, 2020 at 06:26:15PM +0200, Max Gurtovoy wrote: > > On 3/4/2020 6:18 PM, Christoph Hellwig wrote: >> On Wed, Mar 04, 2020 at 06:15:55PM +0200, Max Gurtovoy wrote: >>> On 3/4/2020 6:01 PM, Christoph Hellwig wrote: >>>> On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: >>>>> Set the maximal data transfer size to be 1MB (currently mdts is >>>>> unlimited). This will allow calculating the amount of MR's that >>>>> one ctrl should allocate to fulfill it's capabilities. >>>> Do we reall need an op for this vs just setting the field in >>>> nvme_rdma_create_ctrl? >>> This is target side :) >>> >>> But we can add it to .add_port and limit/set it thorough configfs as we do >>> for inline_data_size (with default of 1MB). >>> >>> Let me know what's preferred. >> I'm always in favor of avoiding indirect function calls if we can. But >> I don't think we need a configfs interface, just do add_port for now. > > wait, I forgot that this will not be compatible with adding MD/T10-PI > (under review) since a port might have a pi-ctrl and non-pi-ctrl. > > So better to have it per controller as in this series.. Oh well, let's go ahead with your version and I'll see if I can clean this up later. I really hate the model of having callbacks for every little setting as in the SCSI target code, as it leads to a real mess. But this is an obious fix, and we also want to get the PI code in ASAP, so I'd rather not hold it back. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op 2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy 2020-03-04 16:01 ` Christoph Hellwig @ 2020-03-04 16:32 ` Christoph Hellwig 1 sibling, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:32 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, hch On Wed, Mar 04, 2020 at 05:39:34PM +0200, Max Gurtovoy wrote: > Set the maximal data transfer size to be 1MB (currently mdts is > unlimited). This will allow calculating the amount of MR's that > one ctrl should allocate to fulfill it's capabilities. > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts 2020-03-04 15:39 [PATCH 1/3] nvmet: Add mdts setting op for controllers Max Gurtovoy 2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy @ 2020-03-04 15:39 ` Max Gurtovoy 2020-03-04 16:32 ` Christoph Hellwig 2020-03-04 19:18 ` Krishnamraju Eraparaju 2020-03-04 16:31 ` [PATCH 1/3] nvmet: Add mdts setting op for controllers Christoph Hellwig 2020-03-04 16:36 ` Bart Van Assche 3 siblings, 2 replies; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 15:39 UTC (permalink / raw) To: jgg, linux-nvme, sagi, hch, kbusch, Chaitanya.Kulkarni Cc: krishna2, Max Gurtovoy, bharat, nirranjan Current nvmet-rdma code allocates MR pool budget based on queue size, assuming both host and target use the same "max_pages_per_mr" count. After limiting the mdts value for RDMA controllers, we know the factor of maximum MR's per IO operation. Thus, make sure MR pool will be sufficient for the required IO depth and IO size. That is, say host's SQ size is 100, then the MR pool budget allocated currently at target will also be 100 MRs. But 100 IO WRITE Requests with 256 sg_count(IO size above 1MB) require 200 MRs when target's "max_pages_per_mr" is 128. Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> --- drivers/nvme/target/rdma.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 5ba76d2..a6c9d11 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -976,7 +976,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) { struct ib_qp_init_attr qp_attr; struct nvmet_rdma_device *ndev = queue->dev; - int comp_vector, nr_cqe, ret, i; + int comp_vector, nr_cqe, ret, i, factor; /* * Spread the io queues across completion vectors, @@ -1009,7 +1009,9 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) qp_attr.qp_type = IB_QPT_RC; /* +1 for drain */ qp_attr.cap.max_send_wr = queue->send_queue_size + 1; - qp_attr.cap.max_rdma_ctxs = queue->send_queue_size; + factor = rdma_rw_mr_factor(ndev->device, queue->cm_id->port_num, + 1 << NVMET_RDMA_MAX_MDTS); + qp_attr.cap.max_rdma_ctxs = queue->send_queue_size * factor; qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd, ndev->device->attrs.max_send_sge); -- 1.8.3.1 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts 2020-03-04 15:39 ` [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts Max Gurtovoy @ 2020-03-04 16:32 ` Christoph Hellwig 2020-03-04 19:18 ` Krishnamraju Eraparaju 1 sibling, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:32 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, hch On Wed, Mar 04, 2020 at 05:39:35PM +0200, Max Gurtovoy wrote: > Current nvmet-rdma code allocates MR pool budget based on queue size, > assuming both host and target use the same "max_pages_per_mr" count. > After limiting the mdts value for RDMA controllers, we know the factor > of maximum MR's per IO operation. Thus, make sure MR pool will be > sufficient for the required IO depth and IO size. > > That is, say host's SQ size is 100, then the MR pool budget allocated > currently at target will also be 100 MRs. But 100 IO WRITE Requests > with 256 sg_count(IO size above 1MB) require 200 MRs when target's > "max_pages_per_mr" is 128. Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts 2020-03-04 15:39 ` [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts Max Gurtovoy 2020-03-04 16:32 ` Christoph Hellwig @ 2020-03-04 19:18 ` Krishnamraju Eraparaju 2020-03-04 22:19 ` Max Gurtovoy 1 sibling, 1 reply; 16+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-04 19:18 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, jgg, kbusch, hch Hi Max Gurtovoy, I just tested this patch series, the issue is not occuring with these patches. Have couple of questons: - Say both host & target has max_fr_pages size of 128 pages, then the number of MRs allocated at target will be twice the size of send_queue_size, as NVMET_RDMA_MAX_MDTS is set to 256 pages. so, in this case, as host can never request an IO of size greater than 128 pages, half of the MRs allocated at target will always left unused. If this is true, will this be a concern in future when NVMET_RDMA_MAX_MDTS limit is increased, but max_fr_pages size of few devices remained at 128 pages? - Also, will just passing the optimal mdts(derived based on max_fr_pages) to host during ctrl identification fixes this issue properly(instead of increasing the max_rdma_ctxs with factor)? I think the target doesn't require multiple MRs in this case as host's blk max_segments got tuned with target's mdts. Please correct me if I'm wrong. Thanks, Krishna. On Wednesday, March 03/04/20, 2020 at 17:39:35 +0200, Max Gurtovoy wrote: > Current nvmet-rdma code allocates MR pool budget based on queue size, > assuming both host and target use the same "max_pages_per_mr" count. > After limiting the mdts value for RDMA controllers, we know the factor > of maximum MR's per IO operation. Thus, make sure MR pool will be > sufficient for the required IO depth and IO size. > > That is, say host's SQ size is 100, then the MR pool budget allocated > currently at target will also be 100 MRs. But 100 IO WRITE Requests > with 256 sg_count(IO size above 1MB) require 200 MRs when target's > "max_pages_per_mr" is 128. > > Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > --- > drivers/nvme/target/rdma.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index 5ba76d2..a6c9d11 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -976,7 +976,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) > { > struct ib_qp_init_attr qp_attr; > struct nvmet_rdma_device *ndev = queue->dev; > - int comp_vector, nr_cqe, ret, i; > + int comp_vector, nr_cqe, ret, i, factor; > > /* > * Spread the io queues across completion vectors, > @@ -1009,7 +1009,9 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) > qp_attr.qp_type = IB_QPT_RC; > /* +1 for drain */ > qp_attr.cap.max_send_wr = queue->send_queue_size + 1; > - qp_attr.cap.max_rdma_ctxs = queue->send_queue_size; > + factor = rdma_rw_mr_factor(ndev->device, queue->cm_id->port_num, > + 1 << NVMET_RDMA_MAX_MDTS); > + qp_attr.cap.max_rdma_ctxs = queue->send_queue_size * factor; > qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd, > ndev->device->attrs.max_send_sge); > > -- > 1.8.3.1 > _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts 2020-03-04 19:18 ` Krishnamraju Eraparaju @ 2020-03-04 22:19 ` Max Gurtovoy 2020-03-05 9:58 ` Krishnamraju Eraparaju 0 siblings, 1 reply; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 22:19 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, jgg, kbusch, hch On 3/4/2020 9:18 PM, Krishnamraju Eraparaju wrote: > Hi Max Gurtovoy, > > I just tested this patch series, the issue is not occuring with these > patches. > > Have couple of questons: > - Say both host & target has max_fr_pages size of 128 pages, then > the number of MRs allocated at target will be twice the size of > send_queue_size, as NVMET_RDMA_MAX_MDTS is set to 256 pages. > > so, in this case, as host can never request an IO of size greater > than 128 pages, half of the MRs allocated at target will always > left unused. > > If this is true, will this be a concern in future when > NVMET_RDMA_MAX_MDTS limit is increased, but max_fr_pages > size of few devices remained at 128 pages? for this I suggested a configfs entry so a user would be able to configure the target mdts as a QoS and/or to save resources. Currently this suggestion is not accepted but let's re-think about it in the future (I think adding some configfs entries for saving resources such as q_depth, mdts, num_queues, etc might be helpful for some users). On the other hand, I didn't limit the mdts even for devices with small amount of max_fr_pages in the target side so it will be able to work with host the can send "big" IOs (with multiple MRs in the target side). I think this is the right approach - better support capable devices and sometimes allocate more than required from host. The target acts as a subsystem controller and expose it's mdts, exactly as the pci ctrl expose it. Sometimes it's bigger than the max_io_size we actually need and it's fine :) > > > - Also, will just passing the optimal mdts(derived based on > max_fr_pages) to host during ctrl identification fixes this issue > properly(instead of increasing the max_rdma_ctxs with factor)? I think > the target doesn't require multiple MRs in this case as host's blk > max_segments got tuned with target's mdts. > > Please correct me if I'm wrong. Linux host max_io_size is also set to 1MB (if the device is capable for it) so you actually won't be needing multiple MRs per IO. I don't know what's optimal_mdts since some users would like to send 1MB IOs and not split it to 4 requests of 256KB in the host side. And since we use RW api we always need the factor because it might be limited by the API one day (today the limit is 256 pages in RW api). From your question, I understand that your device can support upto 512K IOs but I think it will be good idea not to limit hosts that use other devices with target that uses your devices. > > Thanks, > Krishna. > On Wednesday, March 03/04/20, 2020 at 17:39:35 +0200, Max Gurtovoy wrote: >> Current nvmet-rdma code allocates MR pool budget based on queue size, >> assuming both host and target use the same "max_pages_per_mr" count. >> After limiting the mdts value for RDMA controllers, we know the factor >> of maximum MR's per IO operation. Thus, make sure MR pool will be >> sufficient for the required IO depth and IO size. >> >> That is, say host's SQ size is 100, then the MR pool budget allocated >> currently at target will also be 100 MRs. But 100 IO WRITE Requests >> with 256 sg_count(IO size above 1MB) require 200 MRs when target's >> "max_pages_per_mr" is 128. >> >> Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> >> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> >> --- >> drivers/nvme/target/rdma.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c >> index 5ba76d2..a6c9d11 100644 >> --- a/drivers/nvme/target/rdma.c >> +++ b/drivers/nvme/target/rdma.c >> @@ -976,7 +976,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) >> { >> struct ib_qp_init_attr qp_attr; >> struct nvmet_rdma_device *ndev = queue->dev; >> - int comp_vector, nr_cqe, ret, i; >> + int comp_vector, nr_cqe, ret, i, factor; >> >> /* >> * Spread the io queues across completion vectors, >> @@ -1009,7 +1009,9 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) >> qp_attr.qp_type = IB_QPT_RC; >> /* +1 for drain */ >> qp_attr.cap.max_send_wr = queue->send_queue_size + 1; >> - qp_attr.cap.max_rdma_ctxs = queue->send_queue_size; >> + factor = rdma_rw_mr_factor(ndev->device, queue->cm_id->port_num, >> + 1 << NVMET_RDMA_MAX_MDTS); >> + qp_attr.cap.max_rdma_ctxs = queue->send_queue_size * factor; >> qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd, >> ndev->device->attrs.max_send_sge); >> >> -- >> 1.8.3.1 >> > _______________________________________________ > linux-nvme mailing list > linux-nvme@lists.infradead.org > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7C9d945a2bb54543630a1e08d7c070ee88%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637189463598411223&sdata=xBgbsudv9jqJ0mSOYW37zLFvRbxSQ2cyzyFmWCVMSVQ%3D&reserved=0 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts 2020-03-04 22:19 ` Max Gurtovoy @ 2020-03-05 9:58 ` Krishnamraju Eraparaju 0 siblings, 0 replies; 16+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-05 9:58 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, jgg, kbusch, hch On Thursday, March 03/05/20, 2020 at 00:19:01 +0200, Max Gurtovoy wrote: > > On 3/4/2020 9:18 PM, Krishnamraju Eraparaju wrote: > >Hi Max Gurtovoy, > > > >I just tested this patch series, the issue is not occuring with these > >patches. > > > >Have couple of questons: > >- Say both host & target has max_fr_pages size of 128 pages, then > >the number of MRs allocated at target will be twice the size of > >send_queue_size, as NVMET_RDMA_MAX_MDTS is set to 256 pages. > > > >so, in this case, as host can never request an IO of size greater > >than 128 pages, half of the MRs allocated at target will always > >left unused. > > > >If this is true, will this be a concern in future when > >NVMET_RDMA_MAX_MDTS limit is increased, but max_fr_pages > >size of few devices remained at 128 pages? > > for this I suggested a configfs entry so a user would be able to > configure the target mdts as a QoS and/or to save resources. > > Currently this suggestion is not accepted but let's re-think about > it in the future (I think adding some configfs entries for saving > resources such as q_depth, mdts, num_queues, etc might be helpful > for some users). > > On the other hand, I didn't limit the mdts even for devices with > small amount of max_fr_pages in the target side so it will be able > to work with host the can send "big" IOs (with multiple MRs in the > target side). > > I think this is the right approach - better support capable devices > and sometimes allocate more than required from host. > > The target acts as a subsystem controller and expose it's mdts, > exactly as the pci ctrl expose it. Sometimes it's bigger than the > max_io_size we actually need and it's fine :) > > > > > > >- Also, will just passing the optimal mdts(derived based on > >max_fr_pages) to host during ctrl identification fixes this issue > >properly(instead of increasing the max_rdma_ctxs with factor)? I think > >the target doesn't require multiple MRs in this case as host's blk > >max_segments got tuned with target's mdts. > > > >Please correct me if I'm wrong. > > Linux host max_io_size is also set to 1MB (if the device is capable > for it) so you actually won't be needing multiple MRs per IO. > > I don't know what's optimal_mdts since some users would like to send > 1MB IOs and not split it to 4 requests of 256KB in the host side. > > And since we use RW api we always need the factor because it might > be limited by the API one day (today the limit is 256 pages in RW > api). > > From your question, I understand that your device can support upto > 512K IOs but I think it will be good idea not to limit hosts that > use other devices with target that uses your devices. Thanks for the clarification! Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com> > > > > >Thanks, > >Krishna. > >On Wednesday, March 03/04/20, 2020 at 17:39:35 +0200, Max Gurtovoy wrote: > >>Current nvmet-rdma code allocates MR pool budget based on queue size, > >>assuming both host and target use the same "max_pages_per_mr" count. > >>After limiting the mdts value for RDMA controllers, we know the factor > >>of maximum MR's per IO operation. Thus, make sure MR pool will be > >>sufficient for the required IO depth and IO size. > >> > >>That is, say host's SQ size is 100, then the MR pool budget allocated > >>currently at target will also be 100 MRs. But 100 IO WRITE Requests > >>with 256 sg_count(IO size above 1MB) require 200 MRs when target's > >>"max_pages_per_mr" is 128. > >> > >>Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> > >>Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > >>--- > >> drivers/nvme/target/rdma.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >>diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > >>index 5ba76d2..a6c9d11 100644 > >>--- a/drivers/nvme/target/rdma.c > >>+++ b/drivers/nvme/target/rdma.c > >>@@ -976,7 +976,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) > >> { > >> struct ib_qp_init_attr qp_attr; > >> struct nvmet_rdma_device *ndev = queue->dev; > >>- int comp_vector, nr_cqe, ret, i; > >>+ int comp_vector, nr_cqe, ret, i, factor; > >> /* > >> * Spread the io queues across completion vectors, > >>@@ -1009,7 +1009,9 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue) > >> qp_attr.qp_type = IB_QPT_RC; > >> /* +1 for drain */ > >> qp_attr.cap.max_send_wr = queue->send_queue_size + 1; > >>- qp_attr.cap.max_rdma_ctxs = queue->send_queue_size; > >>+ factor = rdma_rw_mr_factor(ndev->device, queue->cm_id->port_num, > >>+ 1 << NVMET_RDMA_MAX_MDTS); > >>+ qp_attr.cap.max_rdma_ctxs = queue->send_queue_size * factor; > >> qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd, > >> ndev->device->attrs.max_send_sge); > >>-- > >>1.8.3.1 > >> > >_______________________________________________ > >linux-nvme mailing list > >linux-nvme@lists.infradead.org > >https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7C9d945a2bb54543630a1e08d7c070ee88%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637189463598411223&sdata=xBgbsudv9jqJ0mSOYW37zLFvRbxSQ2cyzyFmWCVMSVQ%3D&reserved=0 _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] nvmet: Add mdts setting op for controllers 2020-03-04 15:39 [PATCH 1/3] nvmet: Add mdts setting op for controllers Max Gurtovoy 2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy 2020-03-04 15:39 ` [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts Max Gurtovoy @ 2020-03-04 16:31 ` Christoph Hellwig 2020-03-04 16:36 ` Bart Van Assche 3 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2020-03-04 16:31 UTC (permalink / raw) To: Max Gurtovoy Cc: sagi, Chaitanya.Kulkarni, bharat, nirranjan, linux-nvme, krishna2, jgg, kbusch, hch On Wed, Mar 04, 2020 at 05:39:33PM +0200, Max Gurtovoy wrote: > Some transports, such as RDMA, would like to set the Maximum Data > Transfer Size (MDTS) according to device/port/ctrl characteristics. > This will enable the transport to set the optimal MDTS according to > controller needs and device capabilities. Add a new nvmet transport > op that is called during ctrl identification. This will not effect > transports that don't set this option. > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > Signed-off-by: Israel Rukshin <israelr@mellanox.com> Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] nvmet: Add mdts setting op for controllers 2020-03-04 15:39 [PATCH 1/3] nvmet: Add mdts setting op for controllers Max Gurtovoy ` (2 preceding siblings ...) 2020-03-04 16:31 ` [PATCH 1/3] nvmet: Add mdts setting op for controllers Christoph Hellwig @ 2020-03-04 16:36 ` Bart Van Assche 2020-03-04 16:48 ` Max Gurtovoy 3 siblings, 1 reply; 16+ messages in thread From: Bart Van Assche @ 2020-03-04 16:36 UTC (permalink / raw) To: Max Gurtovoy, jgg, linux-nvme, sagi, hch, kbusch, Chaitanya.Kulkarni Cc: krishna2, bharat, nirranjan On 3/4/20 7:39 AM, Max Gurtovoy wrote: > Some transports, such as RDMA, would like to set the Maximum Data > Transfer Size (MDTS) according to device/port/ctrl characteristics. > This will enable the transport to set the optimal MDTS according to > controller needs and device capabilities. Add a new nvmet transport > op that is called during ctrl identification. This will not effect > transports that don't set this option. > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > Signed-off-by: Israel Rukshin <israelr@mellanox.com> > --- > drivers/nvme/target/admin-cmd.c | 8 ++++++-- > drivers/nvme/target/nvmet.h | 1 + > 2 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c > index c0aa9c3..bbbb502 100644 > --- a/drivers/nvme/target/admin-cmd.c > +++ b/drivers/nvme/target/admin-cmd.c > @@ -369,8 +369,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req) > /* we support multiple ports, multiples hosts and ANA: */ > id->cmic = (1 << 0) | (1 << 1) | (1 << 3); > > - /* no limit on data transfer sizes for now */ > - id->mdts = 0; > + /* Limit MDTS according to transport capability */ > + if (ctrl->ops->set_mdts) > + id->mdts = ctrl->ops->set_mdts(ctrl); > + else > + id->mdts = 0; > + > id->cntlid = cpu_to_le16(ctrl->cntlid); > id->ver = cpu_to_le32(ctrl->subsys->ver); > > diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h > index 42ba2dd..1ae41fd 100644 > --- a/drivers/nvme/target/nvmet.h > +++ b/drivers/nvme/target/nvmet.h > @@ -289,6 +289,7 @@ struct nvmet_fabrics_ops { > struct nvmet_port *port, char *traddr); > u16 (*install_queue)(struct nvmet_sq *nvme_sq); > void (*discovery_chg)(struct nvmet_port *port); > + u8 (*set_mdts)(struct nvmet_ctrl *ctrl); > }; > > #define NVMET_MAX_INLINE_BIOVEC 8 Please document the semantics of set_mdts. Is it assumed to return an MDTS value or is it assumed to modify an attribute of 'ctrl'? In the former case, how about renaming 'set_mdts' into 'get_mdts' and declaring the 'ctrl' pointer const? How about documenting the meaning of the return value? Thanks, Bart. _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] nvmet: Add mdts setting op for controllers 2020-03-04 16:36 ` Bart Van Assche @ 2020-03-04 16:48 ` Max Gurtovoy 0 siblings, 0 replies; 16+ messages in thread From: Max Gurtovoy @ 2020-03-04 16:48 UTC (permalink / raw) To: Bart Van Assche, jgg, linux-nvme, sagi, hch, kbusch, Chaitanya.Kulkarni Cc: krishna2, bharat, nirranjan On 3/4/2020 6:36 PM, Bart Van Assche wrote: > On 3/4/20 7:39 AM, Max Gurtovoy wrote: >> Some transports, such as RDMA, would like to set the Maximum Data >> Transfer Size (MDTS) according to device/port/ctrl characteristics. >> This will enable the transport to set the optimal MDTS according to >> controller needs and device capabilities. Add a new nvmet transport >> op that is called during ctrl identification. This will not effect >> transports that don't set this option. >> >> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> >> Signed-off-by: Israel Rukshin <israelr@mellanox.com> >> --- >> drivers/nvme/target/admin-cmd.c | 8 ++++++-- >> drivers/nvme/target/nvmet.h | 1 + >> 2 files changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/nvme/target/admin-cmd.c >> b/drivers/nvme/target/admin-cmd.c >> index c0aa9c3..bbbb502 100644 >> --- a/drivers/nvme/target/admin-cmd.c >> +++ b/drivers/nvme/target/admin-cmd.c >> @@ -369,8 +369,12 @@ static void nvmet_execute_identify_ctrl(struct >> nvmet_req *req) >> /* we support multiple ports, multiples hosts and ANA: */ >> id->cmic = (1 << 0) | (1 << 1) | (1 << 3); >> - /* no limit on data transfer sizes for now */ >> - id->mdts = 0; >> + /* Limit MDTS according to transport capability */ >> + if (ctrl->ops->set_mdts) >> + id->mdts = ctrl->ops->set_mdts(ctrl); >> + else >> + id->mdts = 0; >> + >> id->cntlid = cpu_to_le16(ctrl->cntlid); >> id->ver = cpu_to_le32(ctrl->subsys->ver); >> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h >> index 42ba2dd..1ae41fd 100644 >> --- a/drivers/nvme/target/nvmet.h >> +++ b/drivers/nvme/target/nvmet.h >> @@ -289,6 +289,7 @@ struct nvmet_fabrics_ops { >> struct nvmet_port *port, char *traddr); >> u16 (*install_queue)(struct nvmet_sq *nvme_sq); >> void (*discovery_chg)(struct nvmet_port *port); >> + u8 (*set_mdts)(struct nvmet_ctrl *ctrl); >> }; >> #define NVMET_MAX_INLINE_BIOVEC 8 > > Please document the semantics of set_mdts. Is it assumed to return an > MDTS value or is it assumed to modify an attribute of 'ctrl'? In the > former case, how about renaming 'set_mdts' into 'get_mdts' and > declaring the 'ctrl' pointer const? How about documenting the meaning > of the return value? get_mdts sounds good to me. I'll send V2 tomorrow. Hopefully Krishnamraju will confirm that this series fix his issue. > > Thanks, > > Bart. > > _______________________________________________ linux-nvme mailing list linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-03-05 9:59 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-04 15:39 [PATCH 1/3] nvmet: Add mdts setting op for controllers Max Gurtovoy 2020-03-04 15:39 ` [PATCH 2/3] nvmet-rdma: Implement set_mdts controller op Max Gurtovoy 2020-03-04 16:01 ` Christoph Hellwig 2020-03-04 16:15 ` Max Gurtovoy 2020-03-04 16:18 ` Christoph Hellwig 2020-03-04 16:26 ` Max Gurtovoy 2020-03-04 16:30 ` Christoph Hellwig 2020-03-04 16:32 ` Christoph Hellwig 2020-03-04 15:39 ` [PATCH 3/3] nvmet-rdma: allocate RW ctxs according to mdts Max Gurtovoy 2020-03-04 16:32 ` Christoph Hellwig 2020-03-04 19:18 ` Krishnamraju Eraparaju 2020-03-04 22:19 ` Max Gurtovoy 2020-03-05 9:58 ` Krishnamraju Eraparaju 2020-03-04 16:31 ` [PATCH 1/3] nvmet: Add mdts setting op for controllers Christoph Hellwig 2020-03-04 16:36 ` Bart Van Assche 2020-03-04 16:48 ` Max Gurtovoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).