[PATCH v2 0/2] ublk: speed up ublk server exit handling

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] ublk: speed up ublk server exit handling
@ 2025-07-04  5:41 Uday Shankar
  2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Uday Shankar @ 2025-07-04  5:41 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Caleb Sander Mateos
  Cc: linux-block, linux-kernel, Uday Shankar

Recently, we've observed a few cases where a ublk server is able to
complete restart more quickly than the driver can process the exit of
the previous ublk server. The new ublk server comes up, attempts
recovery of the preexisting ublk devices, and observes them still in
state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
nature of io_uring cleanup and should therefore be handled properly in
the ublk server, it is still preferable to make ublk server exit
handling faster if possible, as we should strive for it to not be a
limiting factor in how fast a ublk server can restart and provide
service again.

The first patch contains the functional change that speeds up the exit
handling, and the second factors out a repeating and error-prone pattern
into a helper.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
Changes in v2:
- Added back ublk_queue::canceling to minimize number of cache lines
  accessed by hot path (Ming Lei)
- Added a ublk_set_canceling helper
- Link to v1: https://lore.kernel.org/r/20250627-ublk_too_many_quiesce-v1-1-55ef9d80a6af@purestorage.com

---
Uday Shankar (2):
      ublk: speed up ublk server exit handling
      ublk: introduce and use ublk_set_canceling helper

 drivers/block/ublk_drv.c | 76 ++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 28 deletions(-)
---
base-commit: e74a1c6a8e8af2422fce125c29b14f1d3fab5b5c
change-id: 20250627-ublk_too_many_quiesce-937d6c36ce46

Best regards,
-- 
Uday Shankar <ushankar@purestorage.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ublk: speed up ublk server exit handling
  2025-07-04  5:41 [PATCH v2 0/2] ublk: speed up ublk server exit handling Uday Shankar
@ 2025-07-04  5:41 ` Uday Shankar
  2025-07-04 12:35   ` Ming Lei
  2025-07-04 13:21   ` Caleb Sander Mateos
  2025-07-04  5:41 ` [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper Uday Shankar
  2025-07-04 15:31 ` [PATCH v2 0/2] ublk: speed up ublk server exit handling Jens Axboe
  2 siblings, 2 replies; 8+ messages in thread
From: Uday Shankar @ 2025-07-04  5:41 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Caleb Sander Mateos
  Cc: linux-block, linux-kernel, Uday Shankar

Recently, we've observed a few cases where a ublk server is able to
complete restart more quickly than the driver can process the exit of
the previous ublk server. The new ublk server comes up, attempts
recovery of the preexisting ublk devices, and observes them still in
state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
nature of io_uring cleanup and should therefore be handled properly in
the ublk server, it is still preferable to make ublk server exit
handling faster if possible, as we should strive for it to not be a
limiting factor in how fast a ublk server can restart and provide
service again.

Analysis of the issue showed that the vast majority of the time spent in
handling the ublk server exit was in calls to blk_mq_quiesce_queue,
which is essentially just a (relatively expensive) call to
synchronize_rcu. The ublk server exit path currently issues an
unnecessarily large number of calls to blk_mq_quiesce_queue, for two
reasons:

1. It tries to call blk_mq_quiesce_queue once per ublk_queue. However,
   blk_mq_quiesce_queue targets the request_queue of the underlying ublk
   device, of which there is only one. So the number of calls is larger
   than necessary by a factor of nr_hw_queues.
2. In practice, it calls blk_mq_quiesce_queue _more_ than once per
   ublk_queue. This is because of a data race where we read
   ubq->canceling without any locking when deciding if we should call
   ublk_start_cancel. It is thus possible for two calls to
   ublk_uring_cmd_cancel_fn against the same ublk_queue to both call
   ublk_start_cancel against the same ublk_queue.

Fix this by making the "canceling" flag a per-device state. This
actually matches the existing code better, as there are several places
where the flag is set or cleared for all queues simultaneously, and
there is the general expectation that cancellation corresponds with ublk
server exit. This per-device canceling flag is then checked under a
(new) lock (addressing the data race (2) above), and the queue is only
quiesced if it is cleared (addressing (1) above). The result is just one
call to blk_mq_quiesce_queue per ublk device.

To minimize the number of cache lines that are accessed in the hot path,
the per-queue canceling flag is kept. The values of the per-device
canceling flag and all per-queue canceling flags should always match.

In our setup, where one ublk server handles I/O for 128 ublk devices,
each having 24 hardware queues of depth 4096, here are the results
before and after this patch, where teardown time is measured from the
first call to io_ring_ctx_wait_and_kill to the return from the last
ublk_ch_release:

						before		after
number of calls to blk_mq_quiesce_queue:	6469		256
teardown time:					11.14s		2.44s

There are still some potential optimizations here, but this takes care
of a big chunk of the ublk server exit handling delay.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index e52c2d1cb8383f8fe171553880c66483984da522..870d57a853a481c2443337717c50d39355804f66 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -235,6 +235,8 @@ struct ublk_device {
 	struct completion	completion;
 	unsigned int		nr_queues_ready;
 	unsigned int		nr_privileged_daemon;
+	struct mutex cancel_mutex;
+	bool canceling;
 };

 /* header of ublk_params */
@@ -1589,6 +1591,7 @@ static int ublk_ch_release(struct inode *inode, struct file *filp)
 	 * All requests may be inflight, so ->canceling may not be set, set
 	 * it now.
 	 */
+	ub->canceling = true;
 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
 		struct ublk_queue *ubq = ublk_get_queue(ub, i);

@@ -1717,23 +1720,18 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 	}
 }

-/* Must be called when queue is frozen */
-static void ublk_mark_queue_canceling(struct ublk_queue *ubq)
-{
-	spin_lock(&ubq->cancel_lock);
-	if (!ubq->canceling)
-		ubq->canceling = true;
-	spin_unlock(&ubq->cancel_lock);
-}
-
-static void ublk_start_cancel(struct ublk_queue *ubq)
+static void ublk_start_cancel(struct ublk_device *ub)
 {
-	struct ublk_device *ub = ubq->dev;
 	struct gendisk *disk = ublk_get_disk(ub);
+	int i;

 	/* Our disk has been dead */
 	if (!disk)
 		return;
+
+	mutex_lock(&ub->cancel_mutex);
+	if (ub->canceling)
+		goto out;
 	/*
 	 * Now we are serialized with ublk_queue_rq()
 	 *
@@ -1742,8 +1740,12 @@ static void ublk_start_cancel(struct ublk_queue *ubq)
 	 * touch completed uring_cmd
 	 */
 	blk_mq_quiesce_queue(disk->queue);
-	ublk_mark_queue_canceling(ubq);
+	ub->canceling = true;
+	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+		ublk_get_queue(ub, i)->canceling = true;
 	blk_mq_unquiesce_queue(disk->queue);
+out:
+	mutex_unlock(&ub->cancel_mutex);
 	ublk_put_disk(disk);
 }

@@ -1816,8 +1818,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
 	if (WARN_ON_ONCE(task && task != io->task))
 		return;

-	if (!ubq->canceling)
-		ublk_start_cancel(ubq);
+	ublk_start_cancel(ubq->dev);

 	WARN_ON_ONCE(io->cmd != cmd);
 	ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
@@ -1944,6 +1945,7 @@ static void ublk_reset_io_flags(struct ublk_device *ub)
 		ubq->canceling = false;
 		ubq->fail_io = false;
 	}
+	ub->canceling = false;
 }

 /* device can only be started after all IOs are ready */
@@ -2652,6 +2654,7 @@ static void ublk_cdev_rel(struct device *dev)
 	ublk_deinit_queues(ub);
 	ublk_free_dev_number(ub);
 	mutex_destroy(&ub->mutex);
+	mutex_destroy(&ub->cancel_mutex);
 	kfree(ub);
 }

@@ -3004,6 +3007,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
 		goto out_unlock;
 	mutex_init(&ub->mutex);
 	spin_lock_init(&ub->lock);
+	mutex_init(&ub->cancel_mutex);

 	ret = ublk_alloc_dev_number(ub, header->dev_id);
 	if (ret < 0)
@@ -3075,6 +3079,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
 	ublk_free_dev_number(ub);
 out_free_ub:
 	mutex_destroy(&ub->mutex);
+	mutex_destroy(&ub->cancel_mutex);
 	kfree(ub);
 out_unlock:
 	mutex_unlock(&ublk_ctl_mutex);
@@ -3429,8 +3434,9 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
 	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
 		goto put_disk;

-	/* Mark all queues as canceling */
+	/* Mark the device as canceling */
 	blk_mq_quiesce_queue(disk->queue);
+	ub->canceling = true;
 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
 		struct ublk_queue *ubq = ublk_get_queue(ub, i);

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper
  2025-07-04  5:41 [PATCH v2 0/2] ublk: speed up ublk server exit handling Uday Shankar
  2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
@ 2025-07-04  5:41 ` Uday Shankar
  2025-07-04 12:36   ` Ming Lei
  2025-07-04 13:30   ` Caleb Sander Mateos
  2025-07-04 15:31 ` [PATCH v2 0/2] ublk: speed up ublk server exit handling Jens Axboe
  2 siblings, 2 replies; 8+ messages in thread
From: Uday Shankar @ 2025-07-04  5:41 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Caleb Sander Mateos
  Cc: linux-block, linux-kernel, Uday Shankar

For performance reasons (minimizing the number of cache lines accessed
in the hot path), we store the "canceling" state redundantly - there is
one flag in the device, which can be considered the source of truth, and
per-queue copies of that flag. This redundancy can cause confusion, and
opens the door to bugs where the state is set inconsistently. Try to
guard against these bugs by introducing a ublk_set_canceling helper
which is the sole mutator of both the per-device and per-queue canceling
state. This helper always sets the state consistently. Use the helper in
all places where we need to modify the canceling state.

No functional changes are expected.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 54 ++++++++++++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 870d57a853a481c2443337717c50d39355804f66..a1a700c7e67a72597e740aaa60f5c3c73f0085e5 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1563,6 +1563,27 @@ static void ublk_put_disk(struct gendisk *disk)
 		put_device(disk_to_dev(disk));
 }
 
+/*
+ * Use this function to ensure that ->canceling is consistently set for
+ * the device and all queues. Do not set these flags directly.
+ *
+ * Caller must ensure that:
+ * - cancel_mutex is held. This ensures that there is no concurrent
+ *   access to ub->canceling and no concurrent writes to ubq->canceling.
+ * - there are no concurrent reads of ubq->canceling from the queue_rq
+ *   path. This can be done by quiescing the queue, or through other
+ *   means.
+ */
+static void ublk_set_canceling(struct ublk_device *ub, bool canceling)
+	__must_hold(&ub->cancel_mutex)
+{
+	int i;
+
+	ub->canceling = canceling;
+	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+		ublk_get_queue(ub, i)->canceling = canceling;
+}
+
 static int ublk_ch_release(struct inode *inode, struct file *filp)
 {
 	struct ublk_device *ub = filp->private_data;
@@ -1591,13 +1612,11 @@ static int ublk_ch_release(struct inode *inode, struct file *filp)
 	 * All requests may be inflight, so ->canceling may not be set, set
 	 * it now.
 	 */
-	ub->canceling = true;
-	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
-		struct ublk_queue *ubq = ublk_get_queue(ub, i);
-
-		ubq->canceling = true;
-		ublk_abort_queue(ub, ubq);
-	}
+	mutex_lock(&ub->cancel_mutex);
+	ublk_set_canceling(ub, true);
+	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+		ublk_abort_queue(ub, ublk_get_queue(ub, i));
+	mutex_unlock(&ub->cancel_mutex);
 	blk_mq_kick_requeue_list(disk->queue);
 
 	/*
@@ -1723,7 +1742,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 static void ublk_start_cancel(struct ublk_device *ub)
 {
 	struct gendisk *disk = ublk_get_disk(ub);
-	int i;
 
 	/* Our disk has been dead */
 	if (!disk)
@@ -1740,9 +1758,7 @@ static void ublk_start_cancel(struct ublk_device *ub)
 	 * touch completed uring_cmd
 	 */
 	blk_mq_quiesce_queue(disk->queue);
-	ub->canceling = true;
-	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
-		ublk_get_queue(ub, i)->canceling = true;
+	ublk_set_canceling(ub, true);
 	blk_mq_unquiesce_queue(disk->queue);
 out:
 	mutex_unlock(&ub->cancel_mutex);
@@ -1942,10 +1958,11 @@ static void ublk_reset_io_flags(struct ublk_device *ub)
 		for (j = 0; j < ubq->q_depth; j++)
 			ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED;
 		spin_unlock(&ubq->cancel_lock);
-		ubq->canceling = false;
 		ubq->fail_io = false;
 	}
-	ub->canceling = false;
+	mutex_lock(&ub->cancel_mutex);
+	ublk_set_canceling(ub, false);
+	mutex_unlock(&ub->cancel_mutex);
 }
 
 /* device can only be started after all IOs are ready */
@@ -3417,7 +3434,7 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
 	/* zero means wait forever */
 	u64 timeout_ms = header->data[0];
 	struct gendisk *disk;
-	int i, ret = -ENODEV;
+	int ret = -ENODEV;
 
 	if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
 		return -EOPNOTSUPP;
@@ -3435,14 +3452,11 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
 		goto put_disk;
 
 	/* Mark the device as canceling */
+	mutex_lock(&ub->cancel_mutex);
 	blk_mq_quiesce_queue(disk->queue);
-	ub->canceling = true;
-	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
-		struct ublk_queue *ubq = ublk_get_queue(ub, i);
-
-		ubq->canceling = true;
-	}
+	ublk_set_canceling(ub, true);
 	blk_mq_unquiesce_queue(disk->queue);
+	mutex_unlock(&ub->cancel_mutex);
 
 	if (!timeout_ms)
 		timeout_ms = UINT_MAX;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ublk: speed up ublk server exit handling
  2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
@ 2025-07-04 12:35   ` Ming Lei
  2025-07-04 13:21   ` Caleb Sander Mateos
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2025-07-04 12:35 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, Caleb Sander Mateos, linux-block, linux-kernel

On Thu, Jul 03, 2025 at 11:41:07PM -0600, Uday Shankar wrote:
> Recently, we've observed a few cases where a ublk server is able to
> complete restart more quickly than the driver can process the exit of
> the previous ublk server. The new ublk server comes up, attempts
> recovery of the preexisting ublk devices, and observes them still in
> state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
> nature of io_uring cleanup and should therefore be handled properly in
> the ublk server, it is still preferable to make ublk server exit
> handling faster if possible, as we should strive for it to not be a
> limiting factor in how fast a ublk server can restart and provide
> service again.
> 
> Analysis of the issue showed that the vast majority of the time spent in
> handling the ublk server exit was in calls to blk_mq_quiesce_queue,
> which is essentially just a (relatively expensive) call to
> synchronize_rcu. The ublk server exit path currently issues an
> unnecessarily large number of calls to blk_mq_quiesce_queue, for two
> reasons:
> 
> 1. It tries to call blk_mq_quiesce_queue once per ublk_queue. However,
>    blk_mq_quiesce_queue targets the request_queue of the underlying ublk
>    device, of which there is only one. So the number of calls is larger
>    than necessary by a factor of nr_hw_queues.
> 2. In practice, it calls blk_mq_quiesce_queue _more_ than once per
>    ublk_queue. This is because of a data race where we read
>    ubq->canceling without any locking when deciding if we should call
>    ublk_start_cancel. It is thus possible for two calls to
>    ublk_uring_cmd_cancel_fn against the same ublk_queue to both call
>    ublk_start_cancel against the same ublk_queue.
> 
> Fix this by making the "canceling" flag a per-device state. This
> actually matches the existing code better, as there are several places
> where the flag is set or cleared for all queues simultaneously, and
> there is the general expectation that cancellation corresponds with ublk
> server exit. This per-device canceling flag is then checked under a
> (new) lock (addressing the data race (2) above), and the queue is only
> quiesced if it is cleared (addressing (1) above). The result is just one
> call to blk_mq_quiesce_queue per ublk device.
> 
> To minimize the number of cache lines that are accessed in the hot path,
> the per-queue canceling flag is kept. The values of the per-device
> canceling flag and all per-queue canceling flags should always match.
> 
> In our setup, where one ublk server handles I/O for 128 ublk devices,
> each having 24 hardware queues of depth 4096, here are the results
> before and after this patch, where teardown time is measured from the
> first call to io_ring_ctx_wait_and_kill to the return from the last
> ublk_ch_release:
> 
> 						before		after
> number of calls to blk_mq_quiesce_queue:	6469		256
> teardown time:					11.14s		2.44s
> 
> There are still some potential optimizations here, but this takes care
> of a big chunk of the ublk server exit handling delay.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>


thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper
  2025-07-04  5:41 ` [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper Uday Shankar
@ 2025-07-04 12:36   ` Ming Lei
  2025-07-04 13:30   ` Caleb Sander Mateos
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2025-07-04 12:36 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, Caleb Sander Mateos, linux-block, linux-kernel

On Thu, Jul 03, 2025 at 11:41:08PM -0600, Uday Shankar wrote:
> For performance reasons (minimizing the number of cache lines accessed
> in the hot path), we store the "canceling" state redundantly - there is
> one flag in the device, which can be considered the source of truth, and
> per-queue copies of that flag. This redundancy can cause confusion, and
> opens the door to bugs where the state is set inconsistently. Try to
> guard against these bugs by introducing a ublk_set_canceling helper
> which is the sole mutator of both the per-device and per-queue canceling
> state. This helper always sets the state consistently. Use the helper in
> all places where we need to modify the canceling state.
> 
> No functional changes are expected.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ublk: speed up ublk server exit handling
  2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
  2025-07-04 12:35   ` Ming Lei
@ 2025-07-04 13:21   ` Caleb Sander Mateos
  1 sibling, 0 replies; 8+ messages in thread
From: Caleb Sander Mateos @ 2025-07-04 13:21 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Ming Lei, Jens Axboe, linux-block, linux-kernel

On Fri, Jul 4, 2025 at 1:41 AM Uday Shankar <ushankar@purestorage.com> wrote:
>
> Recently, we've observed a few cases where a ublk server is able to
> complete restart more quickly than the driver can process the exit of
> the previous ublk server. The new ublk server comes up, attempts
> recovery of the preexisting ublk devices, and observes them still in
> state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
> nature of io_uring cleanup and should therefore be handled properly in
> the ublk server, it is still preferable to make ublk server exit
> handling faster if possible, as we should strive for it to not be a
> limiting factor in how fast a ublk server can restart and provide
> service again.
>
> Analysis of the issue showed that the vast majority of the time spent in
> handling the ublk server exit was in calls to blk_mq_quiesce_queue,
> which is essentially just a (relatively expensive) call to
> synchronize_rcu. The ublk server exit path currently issues an
> unnecessarily large number of calls to blk_mq_quiesce_queue, for two
> reasons:
>
> 1. It tries to call blk_mq_quiesce_queue once per ublk_queue. However,
>    blk_mq_quiesce_queue targets the request_queue of the underlying ublk
>    device, of which there is only one. So the number of calls is larger
>    than necessary by a factor of nr_hw_queues.
> 2. In practice, it calls blk_mq_quiesce_queue _more_ than once per
>    ublk_queue. This is because of a data race where we read
>    ubq->canceling without any locking when deciding if we should call
>    ublk_start_cancel. It is thus possible for two calls to
>    ublk_uring_cmd_cancel_fn against the same ublk_queue to both call
>    ublk_start_cancel against the same ublk_queue.
>
> Fix this by making the "canceling" flag a per-device state. This
> actually matches the existing code better, as there are several places
> where the flag is set or cleared for all queues simultaneously, and
> there is the general expectation that cancellation corresponds with ublk
> server exit. This per-device canceling flag is then checked under a
> (new) lock (addressing the data race (2) above), and the queue is only
> quiesced if it is cleared (addressing (1) above). The result is just one
> call to blk_mq_quiesce_queue per ublk device.
>
> To minimize the number of cache lines that are accessed in the hot path,
> the per-queue canceling flag is kept. The values of the per-device
> canceling flag and all per-queue canceling flags should always match.
>
> In our setup, where one ublk server handles I/O for 128 ublk devices,
> each having 24 hardware queues of depth 4096, here are the results
> before and after this patch, where teardown time is measured from the
> first call to io_ring_ctx_wait_and_kill to the return from the last
> ublk_ch_release:
>
>                                                 before          after
> number of calls to blk_mq_quiesce_queue:        6469            256
> teardown time:                                  11.14s          2.44s
>
> There are still some potential optimizations here, but this takes care
> of a big chunk of the ublk server exit handling delay.
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper
  2025-07-04  5:41 ` [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper Uday Shankar
  2025-07-04 12:36   ` Ming Lei
@ 2025-07-04 13:30   ` Caleb Sander Mateos
  1 sibling, 0 replies; 8+ messages in thread
From: Caleb Sander Mateos @ 2025-07-04 13:30 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Ming Lei, Jens Axboe, linux-block, linux-kernel

On Fri, Jul 4, 2025 at 1:41 AM Uday Shankar <ushankar@purestorage.com> wrote:
>
> For performance reasons (minimizing the number of cache lines accessed
> in the hot path), we store the "canceling" state redundantly - there is
> one flag in the device, which can be considered the source of truth, and
> per-queue copies of that flag. This redundancy can cause confusion, and
> opens the door to bugs where the state is set inconsistently. Try to
> guard against these bugs by introducing a ublk_set_canceling helper
> which is the sole mutator of both the per-device and per-queue canceling
> state. This helper always sets the state consistently. Use the helper in
> all places where we need to modify the canceling state.
>
> No functional changes are expected.
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>  drivers/block/ublk_drv.c | 54 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 34 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 870d57a853a481c2443337717c50d39355804f66..a1a700c7e67a72597e740aaa60f5c3c73f0085e5 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1563,6 +1563,27 @@ static void ublk_put_disk(struct gendisk *disk)
>                 put_device(disk_to_dev(disk));
>  }
>
> +/*
> + * Use this function to ensure that ->canceling is consistently set for
> + * the device and all queues. Do not set these flags directly.
> + *
> + * Caller must ensure that:
> + * - cancel_mutex is held. This ensures that there is no concurrent
> + *   access to ub->canceling and no concurrent writes to ubq->canceling.
> + * - there are no concurrent reads of ubq->canceling from the queue_rq
> + *   path. This can be done by quiescing the queue, or through other
> + *   means.
> + */
> +static void ublk_set_canceling(struct ublk_device *ub, bool canceling)
> +       __must_hold(&ub->cancel_mutex)
> +{
> +       int i;
> +
> +       ub->canceling = canceling;
> +       for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> +               ublk_get_queue(ub, i)->canceling = canceling;
> +}
> +
>  static int ublk_ch_release(struct inode *inode, struct file *filp)
>  {
>         struct ublk_device *ub = filp->private_data;
> @@ -1591,13 +1612,11 @@ static int ublk_ch_release(struct inode *inode, struct file *filp)
>          * All requests may be inflight, so ->canceling may not be set, set
>          * it now.
>          */
> -       ub->canceling = true;
> -       for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> -               struct ublk_queue *ubq = ublk_get_queue(ub, i);
> -
> -               ubq->canceling = true;
> -               ublk_abort_queue(ub, ubq);
> -       }
> +       mutex_lock(&ub->cancel_mutex);
> +       ublk_set_canceling(ub, true);
> +       for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> +               ublk_abort_queue(ub, ublk_get_queue(ub, i));
> +       mutex_unlock(&ub->cancel_mutex);

Pre-existing, but it doesn't look like the queue is quiesced yet here
(that happens later in this function).

Best,
Caleb

>         blk_mq_kick_requeue_list(disk->queue);
>
>         /*
> @@ -1723,7 +1742,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
>  static void ublk_start_cancel(struct ublk_device *ub)
>  {
>         struct gendisk *disk = ublk_get_disk(ub);
> -       int i;
>
>         /* Our disk has been dead */
>         if (!disk)
> @@ -1740,9 +1758,7 @@ static void ublk_start_cancel(struct ublk_device *ub)
>          * touch completed uring_cmd
>          */
>         blk_mq_quiesce_queue(disk->queue);
> -       ub->canceling = true;
> -       for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> -               ublk_get_queue(ub, i)->canceling = true;
> +       ublk_set_canceling(ub, true);
>         blk_mq_unquiesce_queue(disk->queue);
>  out:
>         mutex_unlock(&ub->cancel_mutex);
> @@ -1942,10 +1958,11 @@ static void ublk_reset_io_flags(struct ublk_device *ub)
>                 for (j = 0; j < ubq->q_depth; j++)
>                         ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED;
>                 spin_unlock(&ubq->cancel_lock);
> -               ubq->canceling = false;
>                 ubq->fail_io = false;
>         }
> -       ub->canceling = false;
> +       mutex_lock(&ub->cancel_mutex);
> +       ublk_set_canceling(ub, false);
> +       mutex_unlock(&ub->cancel_mutex);
>  }
>
>  /* device can only be started after all IOs are ready */
> @@ -3417,7 +3434,7 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
>         /* zero means wait forever */
>         u64 timeout_ms = header->data[0];
>         struct gendisk *disk;
> -       int i, ret = -ENODEV;
> +       int ret = -ENODEV;
>
>         if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
>                 return -EOPNOTSUPP;
> @@ -3435,14 +3452,11 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
>                 goto put_disk;
>
>         /* Mark the device as canceling */
> +       mutex_lock(&ub->cancel_mutex);
>         blk_mq_quiesce_queue(disk->queue);
> -       ub->canceling = true;
> -       for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> -               struct ublk_queue *ubq = ublk_get_queue(ub, i);
> -
> -               ubq->canceling = true;
> -       }
> +       ublk_set_canceling(ub, true);
>         blk_mq_unquiesce_queue(disk->queue);
> +       mutex_unlock(&ub->cancel_mutex);
>
>         if (!timeout_ms)
>                 timeout_ms = UINT_MAX;
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] ublk: speed up ublk server exit handling
  2025-07-04  5:41 [PATCH v2 0/2] ublk: speed up ublk server exit handling Uday Shankar
  2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
  2025-07-04  5:41 ` [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper Uday Shankar
@ 2025-07-04 15:31 ` Jens Axboe
  2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2025-07-04 15:31 UTC (permalink / raw)
  To: Ming Lei, Caleb Sander Mateos, Uday Shankar; +Cc: linux-block, linux-kernel


On Thu, 03 Jul 2025 23:41:06 -0600, Uday Shankar wrote:
> Recently, we've observed a few cases where a ublk server is able to
> complete restart more quickly than the driver can process the exit of
> the previous ublk server. The new ublk server comes up, attempts
> recovery of the preexisting ublk devices, and observes them still in
> state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
> nature of io_uring cleanup and should therefore be handled properly in
> the ublk server, it is still preferable to make ublk server exit
> handling faster if possible, as we should strive for it to not be a
> limiting factor in how fast a ublk server can restart and provide
> service again.
> 
> [...]

Applied, thanks!

[1/2] ublk: speed up ublk server exit handling
      commit: 2fa9c93035e17380cafa897ee1a4d503881a3770
[2/2] ublk: introduce and use ublk_set_canceling helper
      commit: 10d77a8c60b2b117868a64875a55c4c8db6f1f2e

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-04 15:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04  5:41 [PATCH v2 0/2] ublk: speed up ublk server exit handling Uday Shankar
2025-07-04  5:41 ` [PATCH v2 1/2] " Uday Shankar
2025-07-04 12:35   ` Ming Lei
2025-07-04 13:21   ` Caleb Sander Mateos
2025-07-04  5:41 ` [PATCH v2 2/2] ublk: introduce and use ublk_set_canceling helper Uday Shankar
2025-07-04 12:36   ` Ming Lei
2025-07-04 13:30   ` Caleb Sander Mateos
2025-07-04 15:31 ` [PATCH v2 0/2] ublk: speed up ublk server exit handling Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).