[PATCH v3] nvme: core: shorten duration of multipath namespace rescan

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] nvme: core: shorten duration of multipath namespace rescan
@ 2024-08-26 16:39 Martin Wilck
  2024-08-27  6:33 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Wilck @ 2024-08-26 16:39 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Niklas Cassel, Hannes Reinecke, Daniel Wagner, Stuart Hayes,
	linux-nvme, Martin Wilck

For multipath devices, nvme_update_ns_info() needs to freeze both
the queue of the path and the queue of the multipath device. For
both operations, it waits for one RCU grace period to pass, ~25ms
on my test system. By calling blk_freeze_queue_start() for the
multipath queue early, we avoid waiting twice; tests using ftrace
have shown that the second blk_mq_freeze_queue_wait() call finishes
in just a few microseconds. The path queue is unfrozen before
calling blk_mq_freeze_queue_wait() on the multipath queue, so that
possibly outstanding IO in the multipath queue can be flushed.

I tested this using the "controller rescan under I/O load" test
I submitted recently [1].

[1] https://lore.kernel.org/linux-nvme/20240822193814.106111-3-mwilck@suse.com/T/#u

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
v3:
 - added an out label and reversed the ret logic (Sagi Grimberg)

v2: (all changes suggested by Sagi Grimberg)
 - patch subject changed from "nvme: core: freeze multipath queue early in
   nvme_update_ns_info()" to "nvme: core: shorten duration of multipath
   namespace rescan"
 - inserted comment explaining why blk_freeze_queue_start() is called early
 - wait for queue to be frozen even if ret != 0
 - make code structure more obvious vs. freeze_start / freeze_wait / unfreeze

Hannes and Daniel had already added Reviewed-by: tags to the v1 patch, but
I didn't add them above, because the patch looks quite different now.
---
 drivers/nvme/host/core.c | 88 ++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 36 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0dc8bcc664f2..13164ca866ea 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2215,8 +2215,20 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
 	bool unsupported = false;
+	struct queue_limits *ns_lim;
+	struct queue_limits lim;
 	int ret;
 
+	/*
+	 * The controller queue is going to be frozen in
+	 * nvme_update_ns_info_{generic,block}(). Every freeze implies waiting
+	 * for an RCU grace period to pass. For multipath devices, we
+	 * need to freeze the multipath queue, too. Start freezing the
+	 * multipath queue now, lest we need to wait for two grace periods.
+	 */
+	if (nvme_ns_head_multipath(ns->head))
+		blk_freeze_queue_start(ns->head->disk->queue);
+
 	switch (info->ids.csi) {
 	case NVME_CSI_ZNS:
 		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
@@ -2250,45 +2262,49 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		ret = 0;
 	}
 
-	if (!ret && nvme_ns_head_multipath(ns->head)) {
-		struct queue_limits *ns_lim = &ns->disk->queue->limits;
-		struct queue_limits lim;
+	if (!nvme_ns_head_multipath(ns->head))
+		return ret;
 
-		blk_mq_freeze_queue(ns->head->disk->queue);
-		/*
-		 * queue_limits mixes values that are the hardware limitations
-		 * for bio splitting with what is the device configuration.
-		 *
-		 * For NVMe the device configuration can change after e.g. a
-		 * Format command, and we really want to pick up the new format
-		 * value here.  But we must still stack the queue limits to the
-		 * least common denominator for multipathing to split the bios
-		 * properly.
-		 *
-		 * To work around this, we explicitly set the device
-		 * configuration to those that we just queried, but only stack
-		 * the splitting limits in to make sure we still obey possibly
-		 * lower limitations of other controllers.
-		 */
-		lim = queue_limits_start_update(ns->head->disk->queue);
-		lim.logical_block_size = ns_lim->logical_block_size;
-		lim.physical_block_size = ns_lim->physical_block_size;
-		lim.io_min = ns_lim->io_min;
-		lim.io_opt = ns_lim->io_opt;
-		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
-					ns->head->disk->disk_name);
-		if (unsupported)
-			ns->head->disk->flags |= GENHD_FL_HIDDEN;
-		else
-			nvme_init_integrity(ns->head, &lim, info);
-		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+	blk_mq_freeze_queue_wait(ns->head->disk->queue);
+	if (ret)
+		goto out;
 
-		set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
-		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
-		nvme_mpath_revalidate_paths(ns);
+	/*
+	 * queue_limits mixes values that are the hardware limitations
+	 * for bio splitting with what is the device configuration.
+	 *
+	 * For NVMe the device configuration can change after e.g. a
+	 * Format command, and we really want to pick up the new format
+	 * value here.  But we must still stack the queue limits to the
+	 * least common denominator for multipathing to split the bios
+	 * properly.
+	 *
+	 * To work around this, we explicitly set the device
+	 * configuration to those that we just queried, but only stack
+	 * the splitting limits in to make sure we still obey possibly
+	 * lower limitations of other controllers.
+	 */
 
-		blk_mq_unfreeze_queue(ns->head->disk->queue);
-	}
+	ns_lim = &ns->disk->queue->limits;
+	lim = queue_limits_start_update(ns->head->disk->queue);
+	lim.logical_block_size = ns_lim->logical_block_size;
+	lim.physical_block_size = ns_lim->physical_block_size;
+	lim.io_min = ns_lim->io_min;
+	lim.io_opt = ns_lim->io_opt;
+	queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
+				ns->head->disk->disk_name);
+	if (unsupported)
+		ns->head->disk->flags |= GENHD_FL_HIDDEN;
+	else
+		nvme_init_integrity(ns->head, &lim, info);
+	ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+
+	set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
+	set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
+	nvme_mpath_revalidate_paths(ns);
+
+out:
+	blk_mq_unfreeze_queue(ns->head->disk->queue);
 
 	return ret;
 }
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme: core: shorten duration of multipath namespace rescan
  2024-08-26 16:39 [PATCH v3] nvme: core: shorten duration of multipath namespace rescan Martin Wilck
@ 2024-08-27  6:33 ` Christoph Hellwig
  2024-08-27 15:42   ` Martin Wilck
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2024-08-27  6:33 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Niklas Cassel, Hannes Reinecke, Daniel Wagner, Stuart Hayes,
	linux-nvme, Martin Wilck

> +	/*
> +	 * The controller queue is going to be frozen in
> +	 * nvme_update_ns_info_{generic,block}(). Every freeze implies waiting
> +	 * for an RCU grace period to pass. For multipath devices, we
> +	 * need to freeze the multipath queue, too. Start freezing the
> +	 * multipath queue now, lest we need to wait for two grace periods.
> +	 */
> +	if (nvme_ns_head_multipath(ns->head))
> +		blk_freeze_queue_start(ns->head->disk->queue);

I really don't like how we keep this state around the calls into
the command set specific nvme_update_ns_info_* calls, and how it
feels very open coded.

And while looking at it, I think we might even want the multipath
queue frozen over the actual update as well, as there is no point
in sending the I/O even to other paths while we are updating the
information, as they'll probably fail.

Here is what might make sense:

 1) move the code past the csi switch statement in nvme_update_ns_info
    into a new nvme_update_ns_info_common helper and all that
    from nvme_update_ns_info_block and nvme_update_ns_info_generic.
    Maybe having another helper just for the multipath updates might
    also be worthwhile.
 2) add new nvme_freeze_ns helpers that freeze the ns and nshead
    queues and use those in nvme_update_ns_info_block and
    nvme_update_ns_info_generic over the entire update
 3) update these new helpers to use the trick from this patch to
    only require a single grace period.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme: core: shorten duration of multipath namespace rescan
  2024-08-27  6:33 ` Christoph Hellwig
@ 2024-08-27 15:42   ` Martin Wilck
  2024-08-29  6:30     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Wilck @ 2024-08-27 15:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, Niklas Cassel,
	Hannes Reinecke, Daniel Wagner, Stuart Hayes, linux-nvme

On Tue, 2024-08-27 at 08:33 +0200, Christoph Hellwig wrote:
> > +	/*
> > +	 * The controller queue is going to be frozen in
> > +	 * nvme_update_ns_info_{generic,block}(). Every freeze
> > implies waiting
> > +	 * for an RCU grace period to pass. For multipath devices,
> > we
> > +	 * need to freeze the multipath queue, too. Start freezing
> > the
> > +	 * multipath queue now, lest we need to wait for two grace
> > periods.
> > +	 */
> > +	if (nvme_ns_head_multipath(ns->head))
> > +		blk_freeze_queue_start(ns->head->disk->queue);
> 
> I really don't like how we keep this state around the calls into
> the command set specific nvme_update_ns_info_* calls, and how it
> feels very open coded.
> 
> And while looking at it, I think we might even want the multipath
> queue frozen over the actual update as well, as there is no point
> in sending the I/O even to other paths while we are updating the
> information, as they'll probably fail.
> 
> Here is what might make sense:
> 
>  1) move the code past the csi switch statement in
> nvme_update_ns_info
>     into a new nvme_update_ns_info_common helper and all that
>     from nvme_update_ns_info_block and nvme_update_ns_info_generic.
>     Maybe having another helper just for the multipath updates might
>     also be worthwhile.
>  2) add new nvme_freeze_ns helpers that freeze the ns and nshead
>     queues and use those in nvme_update_ns_info_block and
>     nvme_update_ns_info_generic over the entire update

Not sure if I understood you correctly, but a single helper for
unfreezing both the path and the multipath queues won't work.The
following code block

        if (blk_queue_is_zoned(ns->queue)) {
                ret = blk_revalidate_disk_zones(ns->disk);
                if (ret && !nvme_first_scan(ns->disk))
                        goto out;
        }

must be called from nvme_update_ns_info_block() after unfreezing the
path queue, but its result is used in the code that must run with the
multipath queue still frozen. So we can freeze both queues at the same
time, but we must unfreeze them separately.

Still trying to work this out.

Martin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] nvme: core: shorten duration of multipath namespace rescan
  2024-08-27 15:42   ` Martin Wilck
@ 2024-08-29  6:30     ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2024-08-29  6:30 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg,
	Niklas Cassel, Hannes Reinecke, Daniel Wagner, Stuart Hayes,
	linux-nvme

On Tue, Aug 27, 2024 at 05:42:52PM +0200, Martin Wilck wrote:
> Not sure if I understood you correctly, but a single helper for
> unfreezing both the path and the multipath queues won't work.The
> following code block
> 
>         if (blk_queue_is_zoned(ns->queue)) {
>                 ret = blk_revalidate_disk_zones(ns->disk);
>                 if (ret && !nvme_first_scan(ns->disk))
>                         goto out;
>         }
> 
> must be called from nvme_update_ns_info_block() after unfreezing the
> path queue, but its result is used in the code that must run with the
> multipath queue still frozen. So we can freeze both queues at the same
> time, but we must unfreeze them separately.

I don't see what we'd use from blk_revalidate_disk_zones in the
propagation to the multipath node.   I've written up my suggested
prep steps as very lightly tested patches, and they seem to work
fine.  I'll send them out.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-29  6:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 16:39 [PATCH v3] nvme: core: shorten duration of multipath namespace rescan Martin Wilck
2024-08-27  6:33 ` Christoph Hellwig
2024-08-27 15:42   ` Martin Wilck
2024-08-29  6:30     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox