public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: Sagi Grimberg <sagi@grimberg.me>, Yi Zhang <yi.zhang@redhat.com>
Cc: Max Gurtovoy <maxg@mellanox.com>,
	"open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>,
	RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: [bug report] NVMe/IB: reset_controller need more than 1min
Date: Sun, 20 Mar 2022 17:11:29 +0200	[thread overview]
Message-ID: <d8283461-8759-39ac-6ef7-a6f5cc739634@nvidia.com> (raw)
In-Reply-To: <7e0abcac-791a-bd71-38c3-4a5490b7f209@grimberg.me>


On 3/20/2022 3:03 PM, Sagi Grimberg wrote:
>
>>>> These are very long times for a non-debug kernel...
>>>> Max, do you see the root cause for this?
>>>>
>>>> Yi, does this happen with rxe/siw as well?
>>> Hi Sagi
>>>
>>> rxe/siw will take less than 1s
>>> with rdma_rxe
>>> # time nvme reset /dev/nvme0
>>> real 0m0.094s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>> with siw
>>> # time nvme reset /dev/nvme0
>>> real 0m0.097s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>> This is only reproducible with mlx IB card, as I mentioned before, the
>>> reset operation time changed from 3s to 12s after the below commit,
>>> could you check this commit?
>>>
>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
>>> Author: Max Gurtovoy <maxg@mellanox.com>
>>> Date:   Tue May 19 17:05:56 2020 +0300
>>>
>>>      nvme-rdma: add metadata/T10-PI support
>>>
>> I couldn't repro these long reset times.
>
> It appears to be when setting up a controller with lots of queues
> maybe?

I'm doing that.

>
>> Nevertheless, the above commit added T10-PI offloads.
>>
>> In this commit, for supported devices we create extra resources in HW 
>> (more memory keys per task).
>>
>> I suggested doing this configuration as part of the "nvme connect" 
>> command and save this resource allocation by default but during the 
>> review I was asked to make it the default behavior.
>
> Don't know if I gave you this feedback or not, but it probably didn't
> occur to the commenter that it will make the connection establishment
> take tens of seconds.

It was known that we create more resources than needed for 
"non-PI-desired" controllers.


>
>> Sagi/Christoph,
>>
>> WDYT ? should we reconsider the "nvme connect --with_metadata" option ?
>
> Maybe you can make these lazily allocated?

You mean something like:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd4720d37cc0..367ba0bb62ab 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev, 
struct hd_geometry *geo)
  }

  #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
-                               u32 max_integrity_segments)
+static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
  {
         struct blk_integrity integrity = { };
+       u16 ms = ns->ms;
+       u8 pi_type = ns->pi_type;
+       u32 max_integrity_segments = ns->ctrl->max_integrity_segments;
+       int ret;
+
+       if (ns->ctrl->ops->init_integrity) {
+               ret = ns->ctrl->ops->init_integrity(ns->ctrl);
+               if (ret)
+                       return ret;
+       }

         switch (pi_type) {
         case NVME_NS_DPS_PI_TYPE3:
@@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct gendisk 
*disk, u16 ms, u8 pi_type,
         integrity.tuple_size = ms;
         blk_integrity_register(disk, &integrity);
         blk_queue_max_integrity_segments(disk->queue, 
max_integrity_segments);
+
+       return 0;
  }
  #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
-                               u32 max_integrity_segments)
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
  {
+       return 0;
  }
  #endif /* CONFIG_BLK_DEV_INTEGRITY */

@@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct gendisk 
*disk,
         if (ns->ms) {
                 if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
                     (ns->features & NVME_NS_METADATA_SUPPORTED))
-                       nvme_init_integrity(disk, ns->ms, ns->pi_type,
- ns->ctrl->max_integrity_segments);
+                       if (nvme_init_integrity(disk, ns))
+                               capacity = 0;
                 else if (!nvme_ns_has_pi(ns))
                         capacity = 0;
         }
@@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);


and create the resources for the first namespace we find as PI formatted ?




  reply	other threads:[~2022-03-20 15:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 16:12 [bug report] NVMe/IB: reset_controller need more than 1min Yi Zhang
2021-05-21 18:00 ` Sagi Grimberg
2021-05-22  4:27   ` Yi Zhang
2021-06-23 10:01     ` Yi Zhang
2021-06-23 21:32       ` Sagi Grimberg
2021-06-24 16:14         ` Yi Zhang
2021-12-11  3:01           ` Yi Zhang
2021-12-12  9:45             ` Sagi Grimberg
2021-12-13  6:12               ` Yi Zhang
2021-12-13  9:04                 ` Sagi Grimberg
2021-12-13 17:05                   ` Yi Zhang
2021-12-14 10:39                     ` Sagi Grimberg
2021-12-14 12:00                       ` Max Gurtovoy
2021-12-15  1:15                         ` Yi Zhang
2021-12-15 12:10                           ` Max Gurtovoy
2021-12-16  2:18                             ` Yi Zhang
2021-12-16 13:21                               ` Max Gurtovoy
2021-12-16 16:32                                 ` Yi Zhang
2021-12-16 17:33                                   ` Haakon Bugge
2021-12-17  7:03                                     ` Yi Zhang
2021-12-17 11:19                                       ` Haakon Bugge
2022-02-14  9:47                                         ` Yi Zhang
2022-02-14 11:00                                           ` Chaitanya Kulkarni
2022-02-14 11:32                                           ` Sagi Grimberg
2022-02-14 12:11                                             ` Max Gurtovoy
2022-02-15 13:52                                               ` Yi Zhang
2022-02-15 14:30                                                 ` Max Gurtovoy
2022-02-21 10:00                                                   ` Yi Zhang
2022-02-23 10:04                                                     ` Max Gurtovoy
2022-02-23 10:30                                                       ` Sagi Grimberg
2022-02-23 11:20                                                         ` Max Gurtovoy
2022-03-01  0:06                                                       ` Yi Zhang
2022-03-16 15:16                                                         ` Sagi Grimberg
2022-03-19  7:29                                                           ` Yi Zhang
2022-03-20 10:50                                                             ` Max Gurtovoy
2022-03-20 13:03                                                               ` Sagi Grimberg
2022-03-20 15:11                                                                 ` Max Gurtovoy [this message]
2022-03-21  9:28                                                                   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8283461-8759-39ac-6ef7-a6f5cc739634@nvidia.com \
    --to=mgurtovoy@nvidia.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox