From: Martin Wilck <martin.wilck@suse.com>
To: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>
Cc: Niklas Cassel <cassel@kernel.org>, Hannes Reinecke <hare@suse.de>,
Daniel Wagner <dwagner@suse.de>,
Stuart Hayes <stuart.w.hayes@gmail.com>,
linux-nvme@lists.infradead.org, Martin Wilck <mwilck@suse.com>
Subject: [PATCH v3] nvme: core: shorten duration of multipath namespace rescan
Date: Mon, 26 Aug 2024 18:39:51 +0200 [thread overview]
Message-ID: <20240826163951.68078-1-mwilck@suse.com> (raw)
For multipath devices, nvme_update_ns_info() needs to freeze both
the queue of the path and the queue of the multipath device. For
both operations, it waits for one RCU grace period to pass, ~25ms
on my test system. By calling blk_freeze_queue_start() for the
multipath queue early, we avoid waiting twice; tests using ftrace
have shown that the second blk_mq_freeze_queue_wait() call finishes
in just a few microseconds. The path queue is unfrozen before
calling blk_mq_freeze_queue_wait() on the multipath queue, so that
possibly outstanding IO in the multipath queue can be flushed.
I tested this using the "controller rescan under I/O load" test
I submitted recently [1].
[1] https://lore.kernel.org/linux-nvme/20240822193814.106111-3-mwilck@suse.com/T/#u
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
v3:
- added an out label and reversed the ret logic (Sagi Grimberg)
v2: (all changes suggested by Sagi Grimberg)
- patch subject changed from "nvme: core: freeze multipath queue early in
nvme_update_ns_info()" to "nvme: core: shorten duration of multipath
namespace rescan"
- inserted comment explaining why blk_freeze_queue_start() is called early
- wait for queue to be frozen even if ret != 0
- make code structure more obvious vs. freeze_start / freeze_wait / unfreeze
Hannes and Daniel had already added Reviewed-by: tags to the v1 patch, but
I didn't add them above, because the patch looks quite different now.
---
drivers/nvme/host/core.c | 88 ++++++++++++++++++++++++----------------
1 file changed, 52 insertions(+), 36 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0dc8bcc664f2..13164ca866ea 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2215,8 +2215,20 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
{
bool unsupported = false;
+ struct queue_limits *ns_lim;
+ struct queue_limits lim;
int ret;
+ /*
+ * The controller queue is going to be frozen in
+ * nvme_update_ns_info_{generic,block}(). Every freeze implies waiting
+ * for an RCU grace period to pass. For multipath devices, we
+ * need to freeze the multipath queue, too. Start freezing the
+ * multipath queue now, lest we need to wait for two grace periods.
+ */
+ if (nvme_ns_head_multipath(ns->head))
+ blk_freeze_queue_start(ns->head->disk->queue);
+
switch (info->ids.csi) {
case NVME_CSI_ZNS:
if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
@@ -2250,45 +2262,49 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
ret = 0;
}
- if (!ret && nvme_ns_head_multipath(ns->head)) {
- struct queue_limits *ns_lim = &ns->disk->queue->limits;
- struct queue_limits lim;
+ if (!nvme_ns_head_multipath(ns->head))
+ return ret;
- blk_mq_freeze_queue(ns->head->disk->queue);
- /*
- * queue_limits mixes values that are the hardware limitations
- * for bio splitting with what is the device configuration.
- *
- * For NVMe the device configuration can change after e.g. a
- * Format command, and we really want to pick up the new format
- * value here. But we must still stack the queue limits to the
- * least common denominator for multipathing to split the bios
- * properly.
- *
- * To work around this, we explicitly set the device
- * configuration to those that we just queried, but only stack
- * the splitting limits in to make sure we still obey possibly
- * lower limitations of other controllers.
- */
- lim = queue_limits_start_update(ns->head->disk->queue);
- lim.logical_block_size = ns_lim->logical_block_size;
- lim.physical_block_size = ns_lim->physical_block_size;
- lim.io_min = ns_lim->io_min;
- lim.io_opt = ns_lim->io_opt;
- queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
- ns->head->disk->disk_name);
- if (unsupported)
- ns->head->disk->flags |= GENHD_FL_HIDDEN;
- else
- nvme_init_integrity(ns->head, &lim, info);
- ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+ blk_mq_freeze_queue_wait(ns->head->disk->queue);
+ if (ret)
+ goto out;
- set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
- set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
- nvme_mpath_revalidate_paths(ns);
+ /*
+ * queue_limits mixes values that are the hardware limitations
+ * for bio splitting with what is the device configuration.
+ *
+ * For NVMe the device configuration can change after e.g. a
+ * Format command, and we really want to pick up the new format
+ * value here. But we must still stack the queue limits to the
+ * least common denominator for multipathing to split the bios
+ * properly.
+ *
+ * To work around this, we explicitly set the device
+ * configuration to those that we just queried, but only stack
+ * the splitting limits in to make sure we still obey possibly
+ * lower limitations of other controllers.
+ */
- blk_mq_unfreeze_queue(ns->head->disk->queue);
- }
+ ns_lim = &ns->disk->queue->limits;
+ lim = queue_limits_start_update(ns->head->disk->queue);
+ lim.logical_block_size = ns_lim->logical_block_size;
+ lim.physical_block_size = ns_lim->physical_block_size;
+ lim.io_min = ns_lim->io_min;
+ lim.io_opt = ns_lim->io_opt;
+ queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
+ ns->head->disk->disk_name);
+ if (unsupported)
+ ns->head->disk->flags |= GENHD_FL_HIDDEN;
+ else
+ nvme_init_integrity(ns->head, &lim, info);
+ ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+
+ set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
+ set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
+ nvme_mpath_revalidate_paths(ns);
+
+out:
+ blk_mq_unfreeze_queue(ns->head->disk->queue);
return ret;
}
--
2.46.0
next reply other threads:[~2024-08-26 16:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 16:39 Martin Wilck [this message]
2024-08-27 6:33 ` [PATCH v3] nvme: core: shorten duration of multipath namespace rescan Christoph Hellwig
2024-08-27 15:42 ` Martin Wilck
2024-08-29 6:30 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240826163951.68078-1-mwilck@suse.com \
--to=martin.wilck@suse.com \
--cc=axboe@kernel.dk \
--cc=cassel@kernel.org \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mwilck@suse.com \
--cc=sagi@grimberg.me \
--cc=stuart.w.hayes@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox