[PATCH 0/4] ublk: support device recovery without I/O queueing

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] ublk: support device recovery without I/O queueing
@ 2024-06-17 19:44 Uday Shankar
  2024-06-17 19:44 ` [PATCH 1/4] ublk: check recovery flags for validity Uday Shankar
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Uday Shankar @ 2024-06-17 19:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Uday Shankar, linux-block

ublk currently supports the following behaviors on ublk server exit:

A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue

and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:

1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands

The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:

default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2

A + 2 is a currently unsupported behavior. This patch series aims to add
support for it.

Uday Shankar (4):
  ublk: check recovery flags for validity
  ublk: refactor recovery configuration flag helpers
  ublk: merge stop_work and quiesce_work
  ublk: support device recovery without I/O queueing

 drivers/block/ublk_drv.c      | 169 +++++++++++++++++++++-------------
 include/uapi/linux/ublk_cmd.h |  18 ++++
 2 files changed, 125 insertions(+), 62 deletions(-)


base-commit: 7c7a0732285c9dbdc52c04241a518df0367fd116
-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] ublk: check recovery flags for validity
  2024-06-17 19:44 [PATCH 0/4] ublk: support device recovery without I/O queueing Uday Shankar
@ 2024-06-17 19:44 ` Uday Shankar
  2024-06-18  2:00   ` Ming Lei
  2024-06-17 19:44 ` [PATCH 2/4] ublk: refactor recovery configuration flag helpers Uday Shankar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-06-17 19:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Uday Shankar, linux-block

Only certain combinations of recovery flags are valid. For example,
setting UBLK_F_USER_RECOVERY_REISSUE without also setting
UBLK_F_USER_RECOVERY is currently silently equivalent to not setting any
recovery flags. Check for such issues and fail add_dev if they are
detected.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4e159948c912..2752a9afe9d4 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -59,6 +59,9 @@
 		| UBLK_F_USER_COPY \
 		| UBLK_F_ZONED)
 
+#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
+		| UBLK_F_USER_RECOVERY_REISSUE)
+
 /* All UBLK_PARAM_TYPE_* should be included here */
 #define UBLK_PARAM_TYPE_ALL                                \
 	(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
@@ -2341,6 +2344,18 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 	else if (!(info.flags & UBLK_F_UNPRIVILEGED_DEV))
 		return -EPERM;
 
+	/* forbid nonsense combinations of recovery flags */
+	switch (info.flags & UBLK_F_ALL_RECOVERY_FLAGS) {
+	case 0:
+	case UBLK_F_USER_RECOVERY:
+	case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
+		break;
+	default:
+		pr_warn("%s: invalid recovery flags %llx\n", __func__,
+			info.flags & UBLK_F_ALL_RECOVERY_FLAGS);
+		return -EINVAL;
+	}
+
 	/*
 	 * unprivileged device can't be trusted, but RECOVERY and
 	 * RECOVERY_REISSUE still may hang error handling, so can't
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-17 19:44 [PATCH 0/4] ublk: support device recovery without I/O queueing Uday Shankar
  2024-06-17 19:44 ` [PATCH 1/4] ublk: check recovery flags for validity Uday Shankar
@ 2024-06-17 19:44 ` Uday Shankar
  2024-06-18  2:11   ` Ming Lei
  2024-07-02 11:09   ` Ming Lei
  2024-06-17 19:44 ` [PATCH 3/4] ublk: merge stop_work and quiesce_work Uday Shankar
  2024-06-17 19:44 ` [PATCH 4/4] ublk: support device recovery without I/O queueing Uday Shankar
  3 siblings, 2 replies; 19+ messages in thread
From: Uday Shankar @ 2024-06-17 19:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Uday Shankar, linux-block

ublk currently supports the following behaviors on ublk server exit:

A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue

and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:

1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands

The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:

default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2

We can't easily change the userspace interface to allow independent
selection of one of {A, B, C} and one of {1, 2}, but we can refactor the
internal helpers which test for the flags. Replace the existing helpers
with the following set:

ublk_nosrv_should_reissue_outstanding: tests for behavior C
ublk_nosrv_should_queue_io: tests for behavior B
ublk_nosrv_should_stop_dev: tests for behavior 1

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 55 +++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2752a9afe9d4..e8cca58a71bc 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -652,22 +652,35 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
 			PAGE_SIZE);
 }
 
-static inline bool ublk_queue_can_use_recovery_reissue(
-		struct ublk_queue *ubq)
+/*
+ * Should I/O outstanding to the ublk server when it exits be reissued?
+ * If not, outstanding I/O will get errors.
+ */
+static inline bool ublk_nosrv_should_reissue_outstanding(struct ublk_device *ub)
 {
-	return (ubq->flags & UBLK_F_USER_RECOVERY) &&
-			(ubq->flags & UBLK_F_USER_RECOVERY_REISSUE);
+	return (ub->dev_info.flags & UBLK_F_USER_RECOVERY) &&
+	       (ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE);
 }
 
-static inline bool ublk_queue_can_use_recovery(
-		struct ublk_queue *ubq)
+/*
+ * Should I/O issued while there is no ublk server queue? If not, I/O
+ * issued while there is no ublk server will get errors.
+ */
+static inline bool ublk_nosrv_should_queue_io(struct ublk_device *ub)
 {
-	return ubq->flags & UBLK_F_USER_RECOVERY;
+	return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
 }
 
-static inline bool ublk_can_use_recovery(struct ublk_device *ub)
+/*
+ * Should ublk devices be stopped (i.e. no recovery possible) when the
+ * ublk server exits? If not, devices can be used again by a future
+ * incarnation of a ublk server via the start_recovery/end_recovery
+ * commands.
+ */
+static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
 {
-	return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
+	return (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY)) &&
+	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE));
 }
 
 static void ublk_free_disk(struct gendisk *disk)
@@ -1043,7 +1056,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
 {
 	WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
 
-	if (ublk_queue_can_use_recovery_reissue(ubq))
+	if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
 		blk_mq_requeue_request(req, false);
 	else
 		ublk_put_req_ref(ubq, req);
@@ -1071,7 +1084,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
 		struct request *rq)
 {
 	/* We cannot process this rq so just requeue it. */
-	if (ublk_queue_can_use_recovery(ubq))
+	if (ublk_nosrv_should_queue_io(ubq->dev))
 		blk_mq_requeue_request(rq, false);
 	else
 		blk_mq_end_request(rq, BLK_STS_IOERR);
@@ -1216,10 +1229,10 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
 		struct ublk_device *ub = ubq->dev;
 
 		if (ublk_abort_requests(ub, ubq)) {
-			if (ublk_can_use_recovery(ub))
-				schedule_work(&ub->quiesce_work);
-			else
+			if (ublk_nosrv_should_stop_dev(ub))
 				schedule_work(&ub->stop_work);
+			else
+				schedule_work(&ub->quiesce_work);
 		}
 		return BLK_EH_DONE;
 	}
@@ -1248,7 +1261,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 	 * Note: force_abort is guaranteed to be seen because it is set
 	 * before request queue is unqiuesced.
 	 */
-	if (ublk_queue_can_use_recovery(ubq) && unlikely(ubq->force_abort))
+	if (ublk_nosrv_should_queue_io(ubq->dev) && unlikely(ubq->force_abort))
 		return BLK_STS_IOERR;
 
 	if (unlikely(ubq->canceling)) {
@@ -1469,10 +1482,10 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
 	ublk_cancel_cmd(ubq, io, issue_flags);
 
 	if (need_schedule) {
-		if (ublk_can_use_recovery(ub))
-			schedule_work(&ub->quiesce_work);
-		else
+		if (ublk_nosrv_should_stop_dev(ub))
 			schedule_work(&ub->stop_work);
+		else
+			schedule_work(&ub->quiesce_work);
 	}
 }
 
@@ -1577,7 +1590,7 @@ static void ublk_stop_dev(struct ublk_device *ub)
 	mutex_lock(&ub->mutex);
 	if (ub->dev_info.state == UBLK_S_DEV_DEAD)
 		goto unlock;
-	if (ublk_can_use_recovery(ub)) {
+	if (ublk_nosrv_should_queue_io(ub)) {
 		if (ub->dev_info.state == UBLK_S_DEV_LIVE)
 			__ublk_quiesce_dev(ub);
 		ublk_unquiesce_dev(ub);
@@ -2674,7 +2687,7 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
 	int i;
 
 	mutex_lock(&ub->mutex);
-	if (!ublk_can_use_recovery(ub))
+	if (ublk_nosrv_should_stop_dev(ub))
 		goto out_unlock;
 	/*
 	 * START_RECOVERY is only allowd after:
@@ -2725,7 +2738,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
 			__func__, ub->dev_info.nr_hw_queues, header->dev_id);
 
 	mutex_lock(&ub->mutex);
-	if (!ublk_can_use_recovery(ub))
+	if (ublk_nosrv_should_stop_dev(ub))
 		goto out_unlock;
 
 	if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] ublk: merge stop_work and quiesce_work
  2024-06-17 19:44 [PATCH 0/4] ublk: support device recovery without I/O queueing Uday Shankar
  2024-06-17 19:44 ` [PATCH 1/4] ublk: check recovery flags for validity Uday Shankar
  2024-06-17 19:44 ` [PATCH 2/4] ublk: refactor recovery configuration flag helpers Uday Shankar
@ 2024-06-17 19:44 ` Uday Shankar
  2024-07-02 13:31   ` Ming Lei
  2024-06-17 19:44 ` [PATCH 4/4] ublk: support device recovery without I/O queueing Uday Shankar
  3 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-06-17 19:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Uday Shankar, linux-block

Save some lines by merging stop_work and quiesce_work into nosrv_work,
which looks at the recovery flags and does the right thing when the "no
ublk server" condition is detected.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c | 64 ++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 39 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index e8cca58a71bc..0496fa372cc1 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -182,8 +182,7 @@ struct ublk_device {
 	unsigned int		nr_queues_ready;
 	unsigned int		nr_privileged_daemon;
 
-	struct work_struct	quiesce_work;
-	struct work_struct	stop_work;
+	struct work_struct	nosrv_work;
 };
 
 /* header of ublk_params */
@@ -1229,10 +1228,7 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
 		struct ublk_device *ub = ubq->dev;
 
 		if (ublk_abort_requests(ub, ubq)) {
-			if (ublk_nosrv_should_stop_dev(ub))
-				schedule_work(&ub->stop_work);
-			else
-				schedule_work(&ub->quiesce_work);
+			schedule_work(&ub->nosrv_work);
 		}
 		return BLK_EH_DONE;
 	}
@@ -1482,10 +1478,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
 	ublk_cancel_cmd(ubq, io, issue_flags);
 
 	if (need_schedule) {
-		if (ublk_nosrv_should_stop_dev(ub))
-			schedule_work(&ub->stop_work);
-		else
-			schedule_work(&ub->quiesce_work);
+		schedule_work(&ub->nosrv_work);
 	}
 }
 
@@ -1548,20 +1541,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub)
 	ub->dev_info.state = UBLK_S_DEV_QUIESCED;
 }
 
-static void ublk_quiesce_work_fn(struct work_struct *work)
-{
-	struct ublk_device *ub =
-		container_of(work, struct ublk_device, quiesce_work);
-
-	mutex_lock(&ub->mutex);
-	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
-		goto unlock;
-	__ublk_quiesce_dev(ub);
- unlock:
-	mutex_unlock(&ub->mutex);
-	ublk_cancel_dev(ub);
-}
-
 static void ublk_unquiesce_dev(struct ublk_device *ub)
 {
 	int i;
@@ -1610,6 +1589,25 @@ static void ublk_stop_dev(struct ublk_device *ub)
 	ublk_cancel_dev(ub);
 }
 
+static void ublk_nosrv_work(struct work_struct *work)
+{
+	struct ublk_device *ub =
+		container_of(work, struct ublk_device, nosrv_work);
+
+	if (ublk_nosrv_should_stop_dev(ub)) {
+		ublk_stop_dev(ub);
+		return;
+	}
+
+	mutex_lock(&ub->mutex);
+	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+		goto unlock;
+	__ublk_quiesce_dev(ub);
+ unlock:
+	mutex_unlock(&ub->mutex);
+	ublk_cancel_dev(ub);
+}
+
 /* device can only be started after all IOs are ready */
 static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
 {
@@ -2124,14 +2122,6 @@ static int ublk_add_chdev(struct ublk_device *ub)
 	return ret;
 }
 
-static void ublk_stop_work_fn(struct work_struct *work)
-{
-	struct ublk_device *ub =
-		container_of(work, struct ublk_device, stop_work);
-
-	ublk_stop_dev(ub);
-}
-
 /* align max io buffer size with PAGE_SIZE */
 static void ublk_align_max_io_size(struct ublk_device *ub)
 {
@@ -2156,8 +2146,7 @@ static int ublk_add_tag_set(struct ublk_device *ub)
 static void ublk_remove(struct ublk_device *ub)
 {
 	ublk_stop_dev(ub);
-	cancel_work_sync(&ub->stop_work);
-	cancel_work_sync(&ub->quiesce_work);
+	cancel_work_sync(&ub->nosrv_work);
 	cdev_device_del(&ub->cdev, &ub->cdev_dev);
 	ublk_put_device(ub);
 	ublks_added--;
@@ -2412,8 +2401,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 		goto out_unlock;
 	mutex_init(&ub->mutex);
 	spin_lock_init(&ub->lock);
-	INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
-	INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
+	INIT_WORK(&ub->nosrv_work, ublk_nosrv_work);
 
 	ret = ublk_alloc_dev_number(ub, header->dev_id);
 	if (ret < 0)
@@ -2548,9 +2536,7 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd)
 static int ublk_ctrl_stop_dev(struct ublk_device *ub)
 {
 	ublk_stop_dev(ub);
-	cancel_work_sync(&ub->stop_work);
-	cancel_work_sync(&ub->quiesce_work);
-
+	cancel_work_sync(&ub->nosrv_work);
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] ublk: support device recovery without I/O queueing
  2024-06-17 19:44 [PATCH 0/4] ublk: support device recovery without I/O queueing Uday Shankar
                   ` (2 preceding siblings ...)
  2024-06-17 19:44 ` [PATCH 3/4] ublk: merge stop_work and quiesce_work Uday Shankar
@ 2024-06-17 19:44 ` Uday Shankar
  2024-07-02 13:46   ` Ming Lei
  3 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-06-17 19:44 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: Uday Shankar, linux-block

ublk currently supports the following behaviors on ublk server exit:

A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue

and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:

1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands

The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:

default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2

The behavior A + 2 is currently unsupported. Add support for this
behavior under the new flag UBLK_F_USER_RECOVERY_NOQUEUE.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 drivers/block/ublk_drv.c      | 53 +++++++++++++++++++++++++++--------
 include/uapi/linux/ublk_cmd.h | 18 ++++++++++++
 2 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 0496fa372cc1..4fec8b48d30e 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -57,10 +57,12 @@
 		| UBLK_F_UNPRIVILEGED_DEV \
 		| UBLK_F_CMD_IOCTL_ENCODE \
 		| UBLK_F_USER_COPY \
-		| UBLK_F_ZONED)
+		| UBLK_F_ZONED \
+		| UBLK_F_USER_RECOVERY_NOQUEUE)
 
 #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
-		| UBLK_F_USER_RECOVERY_REISSUE)
+		| UBLK_F_USER_RECOVERY_REISSUE \
+		| UBLK_F_USER_RECOVERY_NOQUEUE)
 
 /* All UBLK_PARAM_TYPE_* should be included here */
 #define UBLK_PARAM_TYPE_ALL                                \
@@ -679,7 +681,14 @@ static inline bool ublk_nosrv_should_queue_io(struct ublk_device *ub)
 static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
 {
 	return (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY)) &&
-	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE));
+	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE)) &&
+	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_NOQUEUE));
+}
+
+static inline bool ublk_dev_in_recoverable_state(struct ublk_device *ub)
+{
+	return ub->dev_info.state == UBLK_S_DEV_QUIESCED ||
+	       ub->dev_info.state == UBLK_S_DEV_FAIL_IO;
 }
 
 static void ublk_free_disk(struct gendisk *disk)
@@ -1243,6 +1252,11 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request *rq = bd->rq;
 	blk_status_t res;
 
+	if (ubq->dev->dev_info.state == UBLK_S_DEV_FAIL_IO) {
+		return BLK_STS_TARGET;
+	}
+	WARN_ON_ONCE(ubq->dev->dev_info.state != UBLK_S_DEV_LIVE);
+
 	/* fill iod to slot in io cmd buffer */
 	res = ublk_setup_iod(ubq, rq);
 	if (unlikely(res != BLK_STS_OK))
@@ -1602,7 +1616,15 @@ static void ublk_nosrv_work(struct work_struct *work)
 	mutex_lock(&ub->mutex);
 	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
 		goto unlock;
-	__ublk_quiesce_dev(ub);
+
+	if (ublk_nosrv_should_queue_io(ub)) {
+		__ublk_quiesce_dev(ub);
+	} else {
+		blk_mq_quiesce_queue(ub->ub_disk->queue);
+		ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
+		blk_mq_unquiesce_queue(ub->ub_disk->queue);
+	}
+
  unlock:
 	mutex_unlock(&ub->mutex);
 	ublk_cancel_dev(ub);
@@ -2351,6 +2373,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 	case 0:
 	case UBLK_F_USER_RECOVERY:
 	case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
+	case UBLK_F_USER_RECOVERY_NOQUEUE:
 		break;
 	default:
 		pr_warn("%s: invalid recovery flags %llx\n", __func__,
@@ -2682,14 +2705,18 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
 	 *     and related io_uring ctx is freed so file struct of /dev/ublkcX is
 	 *     released.
 	 *
+	 * and one of the following holds
+	 *
 	 * (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
 	 *     (a)has quiesced request queue
 	 *     (b)has requeued every inflight rqs whose io_flags is ACTIVE
 	 *     (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
 	 *     (d)has completed/camceled all ioucmds owned by ther dying process
+	 *
+	 * (3) UBLK_S_DEV_FAIL_IO is set, which means the queue is not
+	 *     quiesced, but all I/O is being immediately errored
 	 */
-	if (test_bit(UB_STATE_OPEN, &ub->state) ||
-			ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+	if (test_bit(UB_STATE_OPEN, &ub->state) || !ublk_dev_in_recoverable_state(ub)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
@@ -2727,18 +2754,22 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
 	if (ublk_nosrv_should_stop_dev(ub))
 		goto out_unlock;
 
-	if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+	if (!ublk_dev_in_recoverable_state(ub)) {
 		ret = -EBUSY;
 		goto out_unlock;
 	}
 	ub->dev_info.ublksrv_pid = ublksrv_pid;
 	pr_devel("%s: new ublksrv_pid %d, dev id %d\n",
 			__func__, ublksrv_pid, header->dev_id);
-	blk_mq_unquiesce_queue(ub->ub_disk->queue);
-	pr_devel("%s: queue unquiesced, dev id %d.\n",
-			__func__, header->dev_id);
-	blk_mq_kick_requeue_list(ub->ub_disk->queue);
+
 	ub->dev_info.state = UBLK_S_DEV_LIVE;
+	if (ublk_nosrv_should_queue_io(ub)) {
+		blk_mq_unquiesce_queue(ub->ub_disk->queue);
+		pr_devel("%s: queue unquiesced, dev id %d.\n",
+				__func__, header->dev_id);
+		blk_mq_kick_requeue_list(ub->ub_disk->queue);
+	}
+
 	ret = 0;
  out_unlock:
 	mutex_unlock(&ub->mutex);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index c8dc5f8ea699..c4512b3a3c52 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -147,8 +147,18 @@
  */
 #define UBLK_F_NEED_GET_DATA (1UL << 2)
 
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is met with errors
+ * - I/O issued while there is no ublk server queues
+ */
 #define UBLK_F_USER_RECOVERY	(1UL << 3)
 
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is reissued
+ * - I/O issued while there is no ublk server queues
+ */
 #define UBLK_F_USER_RECOVERY_REISSUE	(1UL << 4)
 
 /*
@@ -184,10 +194,18 @@
  */
 #define UBLK_F_ZONED (1ULL << 8)
 
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is met with errors
+ * - I/O issued while there is no ublk server is met with errors
+ */
+#define UBLK_F_USER_RECOVERY_NOQUEUE (1ULL << 9)
+
 /* device state */
 #define UBLK_S_DEV_DEAD	0
 #define UBLK_S_DEV_LIVE	1
 #define UBLK_S_DEV_QUIESCED	2
+#define UBLK_S_DEV_FAIL_IO 3
 
 /* shipped via sqe->cmd of io_uring command */
 struct ublksrv_ctrl_cmd {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] ublk: check recovery flags for validity
  2024-06-17 19:44 ` [PATCH 1/4] ublk: check recovery flags for validity Uday Shankar
@ 2024-06-18  2:00   ` Ming Lei
  2024-07-23 19:32     ` Uday Shankar
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-06-18  2:00 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block

On Mon, Jun 17, 2024 at 01:44:48PM -0600, Uday Shankar wrote:
> Only certain combinations of recovery flags are valid. For example,
> setting UBLK_F_USER_RECOVERY_REISSUE without also setting
> UBLK_F_USER_RECOVERY is currently silently equivalent to not setting any
> recovery flags. Check for such issues and fail add_dev if they are
> detected.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>  drivers/block/ublk_drv.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 4e159948c912..2752a9afe9d4 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -59,6 +59,9 @@
>  		| UBLK_F_USER_COPY \
>  		| UBLK_F_ZONED)
>  
> +#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> +		| UBLK_F_USER_RECOVERY_REISSUE)
> +
>  /* All UBLK_PARAM_TYPE_* should be included here */
>  #define UBLK_PARAM_TYPE_ALL                                \
>  	(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> @@ -2341,6 +2344,18 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>  	else if (!(info.flags & UBLK_F_UNPRIVILEGED_DEV))
>  		return -EPERM;
>  
> +	/* forbid nonsense combinations of recovery flags */
> +	switch (info.flags & UBLK_F_ALL_RECOVERY_FLAGS) {
> +	case 0:
> +	case UBLK_F_USER_RECOVERY:
> +	case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
> +		break;
> +	default:
> +		pr_warn("%s: invalid recovery flags %llx\n", __func__,
> +			info.flags & UBLK_F_ALL_RECOVERY_FLAGS);
> +		return -EINVAL;
> +	}
> +

It could be cleaner and more readable to check the fail condition only:

	if ((info.flags & UBLK_F_USER_RECOVERY_REISSUE) &&
		!(info.flags & UBLK_F_USER_RECOVERY)) {
		...
	}


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-17 19:44 ` [PATCH 2/4] ublk: refactor recovery configuration flag helpers Uday Shankar
@ 2024-06-18  2:11   ` Ming Lei
  2024-06-26 17:22     ` Uday Shankar
  2024-07-02 11:09   ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-06-18  2:11 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block

On Mon, Jun 17, 2024 at 01:44:49PM -0600, Uday Shankar wrote:
> ublk currently supports the following behaviors on ublk server exit:
> 
> A: outstanding I/Os get errors, subsequently issued I/Os get errors
> B: outstanding I/Os get errors, subsequently issued I/Os queue
> C: outstanding I/Os get reissued, subsequently issued I/Os queue
> 
> and the following behaviors for recovery of preexisting block devices by
> a future incarnation of the ublk server:
> 
> 1: ublk devices stopped on ublk server exit (no recovery possible)
> 2: ublk devices are recoverable using start/end_recovery commands
> 
> The userspace interface allows selection of combinations of these
> behaviors using flags specified at device creation time, namely:
> 
> default behavior: A + 1
> UBLK_F_USER_RECOVERY: B + 2
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2

ublk is supposed to support A, B & C for both 1 and both 2, but it may
depend on how ublk server is implemented.

In cover letter, it is mentioned that "A + 2 is a currently unsupported
behavior", can you explain it a bit? Such as, how does ublk server
handle the I/O error? And when/how to recover? why doesn't this way
work?

For example, one rough way is to kill the daemon in case of A, then recovery
can be started and new daemon is created for handling IO.

But we are open to improve this area to support more flexible ublk
server implementation.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-18  2:11   ` Ming Lei
@ 2024-06-26 17:22     ` Uday Shankar
  2024-06-27  1:17       ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-06-26 17:22 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Tue, Jun 18, 2024 at 10:11:51AM +0800, Ming Lei wrote:
> On Mon, Jun 17, 2024 at 01:44:49PM -0600, Uday Shankar wrote:
> > ublk currently supports the following behaviors on ublk server exit:
> > 
> > A: outstanding I/Os get errors, subsequently issued I/Os get errors
> > B: outstanding I/Os get errors, subsequently issued I/Os queue
> > C: outstanding I/Os get reissued, subsequently issued I/Os queue
> > 
> > and the following behaviors for recovery of preexisting block devices by
> > a future incarnation of the ublk server:
> > 
> > 1: ublk devices stopped on ublk server exit (no recovery possible)
> > 2: ublk devices are recoverable using start/end_recovery commands
> > 
> > The userspace interface allows selection of combinations of these
> > behaviors using flags specified at device creation time, namely:
> > 
> > default behavior: A + 1
> > UBLK_F_USER_RECOVERY: B + 2
> > UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
> 
> ublk is supposed to support A, B & C for both 1 and both 2, but it may
> depend on how ublk server is implemented.
> 
> In cover letter, it is mentioned that "A + 2 is a currently unsupported
> behavior", can you explain it a bit? Such as, how does ublk server
> handle the I/O error? And when/how to recover? why doesn't this way
> work?

Sorry if this was unclear - the behaviors I describe in A, B, C, 1, 2
are all referring to what is seen by the application using the ublk
block device when the ublk server crashes. There is no sense in which
the ublk server can "handle" the I/O error because during this time,
there is no ublk server and all decisions on how to handle I/O are made
by ublk_drv directly (based on configuration flags specified when the
device was created).

If the ublk server created the device with UBLK_F_USER_RECOVERY, then
when the ublk server has crashed (and not restarted yet), I/Os issued by
the application will queue/hang until the ublk server comes back and
recovers the device, because the underlying request_queue is left in a
quiesced state. So in this case, behavior A is not possible.

If the ublk server created the device without UBLK_F_USER_RECOVERY, then
when the ublk server has crashed (and not restarted yet), I/Os issued by
the application will immediately error (since in this case, ublk will
call del_gendisk).  However, when the ublk server restarts, it cannot
recover the existing ublk device - the disk has been deleted and the
ublk device is in state UBLK_S_DEV_DEAD from which recovery is not
permitted. So in this case, behavior 2 is not possible.

Hence A + 2 is impossible with the current ublk_drv implementation.
Please correct me if I missed something.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-26 17:22     ` Uday Shankar
@ 2024-06-27  1:17       ` Ming Lei
  2024-06-27 17:09         ` Uday Shankar
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-06-27  1:17 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block

On Wed, Jun 26, 2024 at 11:22:43AM -0600, Uday Shankar wrote:
> On Tue, Jun 18, 2024 at 10:11:51AM +0800, Ming Lei wrote:
> > On Mon, Jun 17, 2024 at 01:44:49PM -0600, Uday Shankar wrote:
> > > ublk currently supports the following behaviors on ublk server exit:
> > > 
> > > A: outstanding I/Os get errors, subsequently issued I/Os get errors
> > > B: outstanding I/Os get errors, subsequently issued I/Os queue
> > > C: outstanding I/Os get reissued, subsequently issued I/Os queue
> > > 
> > > and the following behaviors for recovery of preexisting block devices by
> > > a future incarnation of the ublk server:
> > > 
> > > 1: ublk devices stopped on ublk server exit (no recovery possible)
> > > 2: ublk devices are recoverable using start/end_recovery commands
> > > 
> > > The userspace interface allows selection of combinations of these
> > > behaviors using flags specified at device creation time, namely:
> > > 
> > > default behavior: A + 1
> > > UBLK_F_USER_RECOVERY: B + 2
> > > UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
> > 
> > ublk is supposed to support A, B & C for both 1 and both 2, but it may
> > depend on how ublk server is implemented.
> > 
> > In cover letter, it is mentioned that "A + 2 is a currently unsupported
> > behavior", can you explain it a bit? Such as, how does ublk server
> > handle the I/O error? And when/how to recover? why doesn't this way
> > work?
> 
> Sorry if this was unclear - the behaviors I describe in A, B, C, 1, 2
> are all referring to what is seen by the application using the ublk
> block device when the ublk server crashes. There is no sense in which

Yes, usually the app using ublk is supposed to be completely generic,
and won't be taken into account.

> the ublk server can "handle" the I/O error because during this time,
> there is no ublk server and all decisions on how to handle I/O are made
> by ublk_drv directly (based on configuration flags specified when the
> device was created).
> 
> If the ublk server created the device with UBLK_F_USER_RECOVERY, then
> when the ublk server has crashed (and not restarted yet), I/Os issued by
> the application will queue/hang until the ublk server comes back and
> recovers the device, because the underlying request_queue is left in a
> quiesced state. So in this case, behavior A is not possible.

When ublk server is crashed, ublk_abort_requests() will be called to fail
queued inflight requests. Meantime ubq->canceling is set to requeue
new request instead of forwarding it to ublk server.

So behavior A should be supported easily by failing request in
ublk_queue_rq() if ubq->canceling is set.

> 
> If the ublk server created the device without UBLK_F_USER_RECOVERY, then
> when the ublk server has crashed (and not restarted yet), I/Os issued by
> the application will immediately error (since in this case, ublk will
> call del_gendisk).  However, when the ublk server restarts, it cannot
> recover the existing ublk device - the disk has been deleted and the
> ublk device is in state UBLK_S_DEV_DEAD from which recovery is not
> permitted. So in this case, behavior 2 is not possible.

UBLK_F_USER_RECOVERY is supposed for supporting to recover device, and
if this flag isn't enabled, we don't support the feature simply, so
looks behavior 2 isn't one valid case, is it?

> 
> Hence A + 2 is impossible with the current ublk_drv implementation.
> Please correct me if I missed something.

Please see if the above reply can address this case.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-27  1:17       ` Ming Lei
@ 2024-06-27 17:09         ` Uday Shankar
  2024-06-30 13:56           ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-06-27 17:09 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

When I say "behavior A + 2," I mean behavior A and behavior 2 at the
same time on the same ublk device. I still think this is not supported
with current ublk_drv, see below.

> > the ublk server can "handle" the I/O error because during this time,
> > there is no ublk server and all decisions on how to handle I/O are made
> > by ublk_drv directly (based on configuration flags specified when the
> > device was created).
> > 
> > If the ublk server created the device with UBLK_F_USER_RECOVERY, then
> > when the ublk server has crashed (and not restarted yet), I/Os issued by
> > the application will queue/hang until the ublk server comes back and
> > recovers the device, because the underlying request_queue is left in a
> > quiesced state. So in this case, behavior A is not possible.
> 
> When ublk server is crashed, ublk_abort_requests() will be called to fail
> queued inflight requests. Meantime ubq->canceling is set to requeue
> new request instead of forwarding it to ublk server.
> 
> So behavior A should be supported easily by failing request in
> ublk_queue_rq() if ubq->canceling is set.

This argument only works for devices created without
UBLK_F_USER_RECOVERY. If UBLK_F_USER_RECOVERY is set, then the
request_queue for the device is left in a quiesced state and so I/Os
will not even get to ublk_queue_rq. See the following as proof (using a
build of ublksrv master):

# ./ublk add -t loop -f file -r 1
dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 2097152
        max rq size 524288 daemon pid 244608 flags 0x4a state LIVE
        ublkc: 240:0 ublkb: 259:0 owner: 0:0
        queue 0: tid 244610 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 )
        target {"backing_file":"file","dev_size":1073741824,"direct_io":1,"name":"loop","type":1}
# kill -9 244608
# dd if=/dev/urandom of=/dev/ublkb0 count=1 bs=4096 oflag=direct
(hung)

# ps aux | grep " D"
root      244626  0.0  0.0   5620  1880 pts/0    D+   16:57   0:00 dd if=/dev/urandom of=/dev/ublkb0 count=1 bs=4096 oflag=direct
root      244656  0.0  0.0   6408  2188 pts/1    S+   16:58   0:00 grep --color=auto  D
# cat /proc/244626/stack
[<0>] submit_bio_wait+0x63/0x90
[<0>] __blkdev_direct_IO_simple+0xd9/0x1e0
[<0>] blkdev_write_iter+0x1b4/0x230
[<0>] vfs_write+0x2ae/0x3d0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5d/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53
# cat /sys/kernel/debug/block/ublkb0/state
SAME_COMP|NONROT|IO_STAT|INIT_DONE|STATS|REGISTERED|QUIESCED|NOWAIT|SQ_SCHED

Therefore, in order to obtain behavior A with current ublk_drv, one must
not set UBLK_F_USER_RECOVERY.

> > 
> > If the ublk server created the device without UBLK_F_USER_RECOVERY, then
> > when the ublk server has crashed (and not restarted yet), I/Os issued by
> > the application will immediately error (since in this case, ublk will
> > call del_gendisk).  However, when the ublk server restarts, it cannot
> > recover the existing ublk device - the disk has been deleted and the
> > ublk device is in state UBLK_S_DEV_DEAD from which recovery is not
> > permitted. So in this case, behavior 2 is not possible.
> 
> UBLK_F_USER_RECOVERY is supposed for supporting to recover device, and
> if this flag isn't enabled, we don't support the feature simply, so
> looks behavior 2 isn't one valid case, is it?

Sure, so we're in agreement that recovery is impossible if
UBLK_F_USER_RECOVERY is not set.

So:
- To get behavior A, UBLK_F_USER_RECOVERY must be unset
- To get behavior 2, UBLK_F_USER_RECOVERY must be set

Hence, having behavior A and behavior 2, at the same time, on the same
device, would require UBLK_F_USER_RECOVERY to be both set and unset when
that device is created. Obviously that's impossible.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-27 17:09         ` Uday Shankar
@ 2024-06-30 13:56           ` Ming Lei
  2024-07-01 21:02             ` Uday Shankar
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-06-30 13:56 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block, ming.lei

On Thu, Jun 27, 2024 at 11:09:15AM -0600, Uday Shankar wrote:
> When I say "behavior A + 2," I mean behavior A and behavior 2 at the
> same time on the same ublk device. I still think this is not supported
> with current ublk_drv, see below.
> 
> > > the ublk server can "handle" the I/O error because during this time,
> > > there is no ublk server and all decisions on how to handle I/O are made
> > > by ublk_drv directly (based on configuration flags specified when the
> > > device was created).
> > > 
> > > If the ublk server created the device with UBLK_F_USER_RECOVERY, then
> > > when the ublk server has crashed (and not restarted yet), I/Os issued by
> > > the application will queue/hang until the ublk server comes back and
> > > recovers the device, because the underlying request_queue is left in a
> > > quiesced state. So in this case, behavior A is not possible.
> > 
> > When ublk server is crashed, ublk_abort_requests() will be called to fail
> > queued inflight requests. Meantime ubq->canceling is set to requeue
> > new request instead of forwarding it to ublk server.
> > 
> > So behavior A should be supported easily by failing request in
> > ublk_queue_rq() if ubq->canceling is set.
> 
> This argument only works for devices created without
> UBLK_F_USER_RECOVERY. If UBLK_F_USER_RECOVERY is set, then the
> request_queue for the device is left in a quiesced state and so I/Os
> will not even get to ublk_queue_rq. See the following as proof (using a
> build of ublksrv master):

I meant that the following one-line patch may address your issue:


diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4e159948c912..a89240f4f7b0 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1068,7 +1068,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
 		struct request *rq)
 {
 	/* We cannot process this rq so just requeue it. */
-	if (ublk_queue_can_use_recovery(ubq))
+	if (ublk_queue_can_use_recovery_reissue(ubq))
 		blk_mq_requeue_request(rq, false);
 	else
 		blk_mq_end_request(rq, BLK_STS_IOERR);


Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-30 13:56           ` Ming Lei
@ 2024-07-01 21:02             ` Uday Shankar
  2024-07-02  4:07               ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Uday Shankar @ 2024-07-01 21:02 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Sun, Jun 30, 2024 at 09:56:36PM +0800, Ming Lei wrote:
> I meant that the following one-line patch may address your issue:
> 
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 4e159948c912..a89240f4f7b0 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1068,7 +1068,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
>  		struct request *rq)
>  {
>  	/* We cannot process this rq so just requeue it. */
> -	if (ublk_queue_can_use_recovery(ubq))
> +	if (ublk_queue_can_use_recovery_reissue(ubq))
>  		blk_mq_requeue_request(rq, false);
>  	else
>  		blk_mq_end_request(rq, BLK_STS_IOERR);

It does not work (ran the same test from my previous email, got the same
results), and how could it? As I've already mentioned several times, the
root of the issue is that when UBLK_F_USER_RECOVERY is set, the request
queue remains quiesced when the server has exited. Quiescing the queue
means that the block layer will not call the driver's queue_rq when I/Os
are submitted. Instead the block layer will queue those I/Os internally,
only submitting them to the driver when the queue is unquiesced, which,
in the current ublk_drv, only happens when the device is recovered or
deleted.

Having ublk_drv return errors to I/Os issued while there is no ublk
server requires the queue to be unquiesced. My patchset actually does
this (see patch 4/4).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-07-01 21:02             ` Uday Shankar
@ 2024-07-02  4:07               ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2024-07-02  4:07 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block

On Mon, Jul 01, 2024 at 03:02:49PM -0600, Uday Shankar wrote:
> On Sun, Jun 30, 2024 at 09:56:36PM +0800, Ming Lei wrote:
> > I meant that the following one-line patch may address your issue:
> > 
> > 
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 4e159948c912..a89240f4f7b0 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -1068,7 +1068,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
> >  		struct request *rq)
> >  {
> >  	/* We cannot process this rq so just requeue it. */
> > -	if (ublk_queue_can_use_recovery(ubq))
> > +	if (ublk_queue_can_use_recovery_reissue(ubq))
> >  		blk_mq_requeue_request(rq, false);
> >  	else
> >  		blk_mq_end_request(rq, BLK_STS_IOERR);
> 
> It does not work (ran the same test from my previous email, got the same
> results), and how could it? As I've already mentioned several times, the
> root of the issue is that when UBLK_F_USER_RECOVERY is set, the request
> queue remains quiesced when the server has exited. Quiescing the queue
> means that the block layer will not call the driver's queue_rq when I/Os
> are submitted. Instead the block layer will queue those I/Os internally,
> only submitting them to the driver when the queue is unquiesced, which,
> in the current ublk_drv, only happens when the device is recovered or
> deleted.
> 
> Having ublk_drv return errors to I/Os issued while there is no ublk
> server requires the queue to be unquiesced. My patchset actually does
> this (see patch 4/4).

Sorry for ignoring the fact that queue is kept as quiesced after ublk
server is crashed, and I will review patch 4 soon.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-06-17 19:44 ` [PATCH 2/4] ublk: refactor recovery configuration flag helpers Uday Shankar
  2024-06-18  2:11   ` Ming Lei
@ 2024-07-02 11:09   ` Ming Lei
  2024-09-17  0:26     ` Uday Shankar
  1 sibling, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-07-02 11:09 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block, ming.lei

On Mon, Jun 17, 2024 at 01:44:49PM -0600, Uday Shankar wrote:
> ublk currently supports the following behaviors on ublk server exit:
> 
> A: outstanding I/Os get errors, subsequently issued I/Os get errors
> B: outstanding I/Os get errors, subsequently issued I/Os queue
> C: outstanding I/Os get reissued, subsequently issued I/Os queue
> 
> and the following behaviors for recovery of preexisting block devices by
> a future incarnation of the ublk server:
> 
> 1: ublk devices stopped on ublk server exit (no recovery possible)
> 2: ublk devices are recoverable using start/end_recovery commands
> 
> The userspace interface allows selection of combinations of these
> behaviors using flags specified at device creation time, namely:
> 
> default behavior: A + 1
> UBLK_F_USER_RECOVERY: B + 2
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
> 
> We can't easily change the userspace interface to allow independent
> selection of one of {A, B, C} and one of {1, 2}, but we can refactor the
> internal helpers which test for the flags. Replace the existing helpers
> with the following set:
> 
> ublk_nosrv_should_reissue_outstanding: tests for behavior C
> ublk_nosrv_should_queue_io: tests for behavior B
> ublk_nosrv_should_stop_dev: tests for behavior 1
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>  drivers/block/ublk_drv.c | 55 +++++++++++++++++++++++++---------------
>  1 file changed, 34 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 2752a9afe9d4..e8cca58a71bc 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -652,22 +652,35 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
>  			PAGE_SIZE);
>  }
>  
> -static inline bool ublk_queue_can_use_recovery_reissue(
> -		struct ublk_queue *ubq)
> +/*
> + * Should I/O outstanding to the ublk server when it exits be reissued?
> + * If not, outstanding I/O will get errors.
> + */
> +static inline bool ublk_nosrv_should_reissue_outstanding(struct ublk_device *ub)
>  {
> -	return (ubq->flags & UBLK_F_USER_RECOVERY) &&
> -			(ubq->flags & UBLK_F_USER_RECOVERY_REISSUE);
> +	return (ub->dev_info.flags & UBLK_F_USER_RECOVERY) &&
> +	       (ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE);
>  }
>  
> -static inline bool ublk_queue_can_use_recovery(
> -		struct ublk_queue *ubq)
> +/*
> + * Should I/O issued while there is no ublk server queue? If not, I/O
> + * issued while there is no ublk server will get errors.
> + */
> +static inline bool ublk_nosrv_should_queue_io(struct ublk_device *ub)
>  {
> -	return ubq->flags & UBLK_F_USER_RECOVERY;
> +	return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
>  }
>  
> -static inline bool ublk_can_use_recovery(struct ublk_device *ub)
> +/*
> + * Should ublk devices be stopped (i.e. no recovery possible) when the
> + * ublk server exits? If not, devices can be used again by a future
> + * incarnation of a ublk server via the start_recovery/end_recovery
> + * commands.
> + */
> +static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
>  {
> -	return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
> +	return (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY)) &&
> +	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE));
>  }
>  
>  static void ublk_free_disk(struct gendisk *disk)
> @@ -1043,7 +1056,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
>  {
>  	WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
>  
> -	if (ublk_queue_can_use_recovery_reissue(ubq))
> +	if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
>  		blk_mq_requeue_request(req, false);
>  	else
>  		ublk_put_req_ref(ubq, req);
> @@ -1071,7 +1084,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
>  		struct request *rq)
>  {
>  	/* We cannot process this rq so just requeue it. */
> -	if (ublk_queue_can_use_recovery(ubq))
> +	if (ublk_nosrv_should_queue_io(ubq->dev))

I feel the name of ublk_nosrv_should_queue_io() is a bit misleading.

The difference between ublk_queue_can_use_recovery() and
ublk_queue_can_use_recovery_reissue() is clear, and
both two need to queue ios actually in case of nosrv most times
except for this one.

However, looks your patch just tries to replace
ublk_queue_can_use_recovery() with ublk_nosrv_should_queue_io().

>  		blk_mq_requeue_request(rq, false);
>  	else
>  		blk_mq_end_request(rq, BLK_STS_IOERR);
> @@ -1216,10 +1229,10 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>  		struct ublk_device *ub = ubq->dev;
>  
>  		if (ublk_abort_requests(ub, ubq)) {
> -			if (ublk_can_use_recovery(ub))
> -				schedule_work(&ub->quiesce_work);
> -			else
> +			if (ublk_nosrv_should_stop_dev(ub))

The helper looks easy to follow, include the following conversions.

>  				schedule_work(&ub->stop_work);
> +			else
> +				schedule_work(&ub->quiesce_work);
>  		}
>  		return BLK_EH_DONE;
>  	}
> @@ -1248,7 +1261,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	 * Note: force_abort is guaranteed to be seen because it is set
>  	 * before request queue is unqiuesced.
>  	 */
> -	if (ublk_queue_can_use_recovery(ubq) && unlikely(ubq->force_abort))
> +	if (ublk_nosrv_should_queue_io(ubq->dev) && unlikely(ubq->force_abort))
>  		return BLK_STS_IOERR;

I'd rather to not fetch ublk_device in fast io path since ublk is MQ
device, and only the queue structure should be touched in fast io path,
but it is fine to check device in any slow path.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] ublk: merge stop_work and quiesce_work
  2024-06-17 19:44 ` [PATCH 3/4] ublk: merge stop_work and quiesce_work Uday Shankar
@ 2024-07-02 13:31   ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2024-07-02 13:31 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block

On Mon, Jun 17, 2024 at 01:44:50PM -0600, Uday Shankar wrote:
> Save some lines by merging stop_work and quiesce_work into nosrv_work,
> which looks at the recovery flags and does the right thing when the "no
> ublk server" condition is detected.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>  drivers/block/ublk_drv.c | 64 ++++++++++++++++------------------------
>  1 file changed, 25 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index e8cca58a71bc..0496fa372cc1 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -182,8 +182,7 @@ struct ublk_device {
>  	unsigned int		nr_queues_ready;
>  	unsigned int		nr_privileged_daemon;
>  
> -	struct work_struct	quiesce_work;
> -	struct work_struct	stop_work;
> +	struct work_struct	nosrv_work;
>  };
>  
>  /* header of ublk_params */
> @@ -1229,10 +1228,7 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>  		struct ublk_device *ub = ubq->dev;
>  
>  		if (ublk_abort_requests(ub, ubq)) {
> -			if (ublk_nosrv_should_stop_dev(ub))
> -				schedule_work(&ub->stop_work);
> -			else
> -				schedule_work(&ub->quiesce_work);
> +			schedule_work(&ub->nosrv_work);
>  		}
>  		return BLK_EH_DONE;
>  	}
> @@ -1482,10 +1478,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>  	ublk_cancel_cmd(ubq, io, issue_flags);
>  
>  	if (need_schedule) {
> -		if (ublk_nosrv_should_stop_dev(ub))
> -			schedule_work(&ub->stop_work);
> -		else
> -			schedule_work(&ub->quiesce_work);
> +		schedule_work(&ub->nosrv_work);
>  	}
>  }
>  
> @@ -1548,20 +1541,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub)
>  	ub->dev_info.state = UBLK_S_DEV_QUIESCED;
>  }
>  
> -static void ublk_quiesce_work_fn(struct work_struct *work)
> -{
> -	struct ublk_device *ub =
> -		container_of(work, struct ublk_device, quiesce_work);
> -
> -	mutex_lock(&ub->mutex);
> -	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> -		goto unlock;
> -	__ublk_quiesce_dev(ub);
> - unlock:
> -	mutex_unlock(&ub->mutex);
> -	ublk_cancel_dev(ub);
> -}
> -
>  static void ublk_unquiesce_dev(struct ublk_device *ub)
>  {
>  	int i;
> @@ -1610,6 +1589,25 @@ static void ublk_stop_dev(struct ublk_device *ub)
>  	ublk_cancel_dev(ub);
>  }
>  
> +static void ublk_nosrv_work(struct work_struct *work)
> +{
> +	struct ublk_device *ub =
> +		container_of(work, struct ublk_device, nosrv_work);
> +
> +	if (ublk_nosrv_should_stop_dev(ub)) {
> +		ublk_stop_dev(ub);
> +		return;
> +	}
> +
> +	mutex_lock(&ub->mutex);
> +	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> +		goto unlock;
> +	__ublk_quiesce_dev(ub);
> + unlock:
> +	mutex_unlock(&ub->mutex);
> +	ublk_cancel_dev(ub);
> +}
> +
>  /* device can only be started after all IOs are ready */
>  static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
>  {
> @@ -2124,14 +2122,6 @@ static int ublk_add_chdev(struct ublk_device *ub)
>  	return ret;
>  }
>  
> -static void ublk_stop_work_fn(struct work_struct *work)
> -{
> -	struct ublk_device *ub =
> -		container_of(work, struct ublk_device, stop_work);
> -
> -	ublk_stop_dev(ub);
> -}
> -
>  /* align max io buffer size with PAGE_SIZE */
>  static void ublk_align_max_io_size(struct ublk_device *ub)
>  {
> @@ -2156,8 +2146,7 @@ static int ublk_add_tag_set(struct ublk_device *ub)
>  static void ublk_remove(struct ublk_device *ub)
>  {
>  	ublk_stop_dev(ub);
> -	cancel_work_sync(&ub->stop_work);
> -	cancel_work_sync(&ub->quiesce_work);
> +	cancel_work_sync(&ub->nosrv_work);
>  	cdev_device_del(&ub->cdev, &ub->cdev_dev);
>  	ublk_put_device(ub);
>  	ublks_added--;
> @@ -2412,8 +2401,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>  		goto out_unlock;
>  	mutex_init(&ub->mutex);
>  	spin_lock_init(&ub->lock);
> -	INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
> -	INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
> +	INIT_WORK(&ub->nosrv_work, ublk_nosrv_work);
>  
>  	ret = ublk_alloc_dev_number(ub, header->dev_id);
>  	if (ret < 0)
> @@ -2548,9 +2536,7 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd)
>  static int ublk_ctrl_stop_dev(struct ublk_device *ub)
>  {
>  	ublk_stop_dev(ub);
> -	cancel_work_sync(&ub->stop_work);
> -	cancel_work_sync(&ub->quiesce_work);
> -
> +	cancel_work_sync(&ub->nosrv_work);
>  	return 0;
>  }

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] ublk: support device recovery without I/O queueing
  2024-06-17 19:44 ` [PATCH 4/4] ublk: support device recovery without I/O queueing Uday Shankar
@ 2024-07-02 13:46   ` Ming Lei
  2024-09-17  0:29     ` Uday Shankar
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2024-07-02 13:46 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block, ming.lei

On Mon, Jun 17, 2024 at 01:44:51PM -0600, Uday Shankar wrote:
> ublk currently supports the following behaviors on ublk server exit:
> 
> A: outstanding I/Os get errors, subsequently issued I/Os get errors
> B: outstanding I/Os get errors, subsequently issued I/Os queue
> C: outstanding I/Os get reissued, subsequently issued I/Os queue
> 
> and the following behaviors for recovery of preexisting block devices by
> a future incarnation of the ublk server:
> 
> 1: ublk devices stopped on ublk server exit (no recovery possible)
> 2: ublk devices are recoverable using start/end_recovery commands
> 
> The userspace interface allows selection of combinations of these
> behaviors using flags specified at device creation time, namely:
> 
> default behavior: A + 1
> UBLK_F_USER_RECOVERY: B + 2
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
> 
> The behavior A + 2 is currently unsupported. Add support for this
> behavior under the new flag UBLK_F_USER_RECOVERY_NOQUEUE.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
>  drivers/block/ublk_drv.c      | 53 +++++++++++++++++++++++++++--------
>  include/uapi/linux/ublk_cmd.h | 18 ++++++++++++
>  2 files changed, 60 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 0496fa372cc1..4fec8b48d30e 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -57,10 +57,12 @@
>  		| UBLK_F_UNPRIVILEGED_DEV \
>  		| UBLK_F_CMD_IOCTL_ENCODE \
>  		| UBLK_F_USER_COPY \
> -		| UBLK_F_ZONED)
> +		| UBLK_F_ZONED \
> +		| UBLK_F_USER_RECOVERY_NOQUEUE)
>  
>  #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> -		| UBLK_F_USER_RECOVERY_REISSUE)
> +		| UBLK_F_USER_RECOVERY_REISSUE \
> +		| UBLK_F_USER_RECOVERY_NOQUEUE)
>  
>  /* All UBLK_PARAM_TYPE_* should be included here */
>  #define UBLK_PARAM_TYPE_ALL                                \
> @@ -679,7 +681,14 @@ static inline bool ublk_nosrv_should_queue_io(struct ublk_device *ub)
>  static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
>  {
>  	return (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY)) &&
> -	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE));
> +	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE)) &&
> +	       (!(ub->dev_info.flags & UBLK_F_USER_RECOVERY_NOQUEUE));
> +}
> +
> +static inline bool ublk_dev_in_recoverable_state(struct ublk_device *ub)
> +{
> +	return ub->dev_info.state == UBLK_S_DEV_QUIESCED ||
> +	       ub->dev_info.state == UBLK_S_DEV_FAIL_IO;
>  }
>  
>  static void ublk_free_disk(struct gendisk *disk)
> @@ -1243,6 +1252,11 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct request *rq = bd->rq;
>  	blk_status_t res;
>  
> +	if (ubq->dev->dev_info.state == UBLK_S_DEV_FAIL_IO) {
> +		return BLK_STS_TARGET;
> +	}
> +	WARN_ON_ONCE(ubq->dev->dev_info.state != UBLK_S_DEV_LIVE);
> +

I'd suggest to add one per-ublk-queue flag for this purpose instead of
device state, then fetching device can be avoided in fast io path, please see
similar example of `->force_abort`.

>  	/* fill iod to slot in io cmd buffer */
>  	res = ublk_setup_iod(ubq, rq);
>  	if (unlikely(res != BLK_STS_OK))
> @@ -1602,7 +1616,15 @@ static void ublk_nosrv_work(struct work_struct *work)
>  	mutex_lock(&ub->mutex);
>  	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
>  		goto unlock;
> -	__ublk_quiesce_dev(ub);
> +
> +	if (ublk_nosrv_should_queue_io(ub)) {

Here ublk_nosrv_should_queue_io() doesn't cover UBLK_F_USER_RECOVERY_REISSUE.

> +		__ublk_quiesce_dev(ub);
> +	} else {
> +		blk_mq_quiesce_queue(ub->ub_disk->queue);
> +		ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
> +		blk_mq_unquiesce_queue(ub->ub_disk->queue);
> +	}

If the above extra device state is saved, the whole change can be simplified,
and __ublk_quiesce_dev() still can be called for
UBLK_F_USER_RECOVERY_NOQUEUE, and UBLK_S_DEV_QUIESCED can cover this new flag,
meantime setting one per-ublk-queue flag, such as, ->fail_io_in_recovery.

> +
>   unlock:
>  	mutex_unlock(&ub->mutex);
>  	ublk_cancel_dev(ub);
> @@ -2351,6 +2373,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>  	case 0:
>  	case UBLK_F_USER_RECOVERY:
>  	case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
> +	case UBLK_F_USER_RECOVERY_NOQUEUE:
>  		break;
>  	default:
>  		pr_warn("%s: invalid recovery flags %llx\n", __func__,
> @@ -2682,14 +2705,18 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
>  	 *     and related io_uring ctx is freed so file struct of /dev/ublkcX is
>  	 *     released.
>  	 *
> +	 * and one of the following holds
> +	 *
>  	 * (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
>  	 *     (a)has quiesced request queue
>  	 *     (b)has requeued every inflight rqs whose io_flags is ACTIVE
>  	 *     (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
>  	 *     (d)has completed/camceled all ioucmds owned by ther dying process
> +	 *
> +	 * (3) UBLK_S_DEV_FAIL_IO is set, which means the queue is not
> +	 *     quiesced, but all I/O is being immediately errored
>  	 */
> -	if (test_bit(UB_STATE_OPEN, &ub->state) ||
> -			ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
> +	if (test_bit(UB_STATE_OPEN, &ub->state) || !ublk_dev_in_recoverable_state(ub)) {
>  		ret = -EBUSY;
>  		goto out_unlock;
>  	}
> @@ -2727,18 +2754,22 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
>  	if (ublk_nosrv_should_stop_dev(ub))
>  		goto out_unlock;
>  
> -	if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
> +	if (!ublk_dev_in_recoverable_state(ub)) {
>  		ret = -EBUSY;
>  		goto out_unlock;
>  	}
>  	ub->dev_info.ublksrv_pid = ublksrv_pid;
>  	pr_devel("%s: new ublksrv_pid %d, dev id %d\n",
>  			__func__, ublksrv_pid, header->dev_id);
> -	blk_mq_unquiesce_queue(ub->ub_disk->queue);
> -	pr_devel("%s: queue unquiesced, dev id %d.\n",
> -			__func__, header->dev_id);
> -	blk_mq_kick_requeue_list(ub->ub_disk->queue);
> +
>  	ub->dev_info.state = UBLK_S_DEV_LIVE;
> +	if (ublk_nosrv_should_queue_io(ub)) {
> +		blk_mq_unquiesce_queue(ub->ub_disk->queue);
> +		pr_devel("%s: queue unquiesced, dev id %d.\n",
> +				__func__, header->dev_id);
> +		blk_mq_kick_requeue_list(ub->ub_disk->queue);
> +	}
> +
>  	ret = 0;
>   out_unlock:
>  	mutex_unlock(&ub->mutex);
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index c8dc5f8ea699..c4512b3a3c52 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -147,8 +147,18 @@
>   */
>  #define UBLK_F_NEED_GET_DATA (1UL << 2)
>  
> +/*
> + * - Block devices are recoverable if ublk server exits and restarts
> + * - Outstanding I/O when ublk server exits is met with errors
> + * - I/O issued while there is no ublk server queues
> + */
>  #define UBLK_F_USER_RECOVERY	(1UL << 3)
>  
> +/*
> + * - Block devices are recoverable if ublk server exits and restarts
> + * - Outstanding I/O when ublk server exits is reissued
> + * - I/O issued while there is no ublk server queues
> + */
>  #define UBLK_F_USER_RECOVERY_REISSUE	(1UL << 4)
>  
>  /*
> @@ -184,10 +194,18 @@
>   */
>  #define UBLK_F_ZONED (1ULL << 8)
>  
> +/*
> + * - Block devices are recoverable if ublk server exits and restarts
> + * - Outstanding I/O when ublk server exits is met with errors
> + * - I/O issued while there is no ublk server is met with errors
> + */
> +#define UBLK_F_USER_RECOVERY_NOQUEUE (1ULL << 9)

Maybe UBLK_F_USER_RECOVERY_FAIL_IO is more readable?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] ublk: check recovery flags for validity
  2024-06-18  2:00   ` Ming Lei
@ 2024-07-23 19:32     ` Uday Shankar
  0 siblings, 0 replies; 19+ messages in thread
From: Uday Shankar @ 2024-07-23 19:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Tue, Jun 18, 2024 at 10:00:08AM +0800, Ming Lei wrote:
> > +	/* forbid nonsense combinations of recovery flags */
> > +	switch (info.flags & UBLK_F_ALL_RECOVERY_FLAGS) {
> > +	case 0:
> > +	case UBLK_F_USER_RECOVERY:
> > +	case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
> > +		break;
> > +	default:
> > +		pr_warn("%s: invalid recovery flags %llx\n", __func__,
> > +			info.flags & UBLK_F_ALL_RECOVERY_FLAGS);
> > +		return -EINVAL;
> > +	}
> > +
> 
> It could be cleaner and more readable to check the fail condition only:
> 
> 	if ((info.flags & UBLK_F_USER_RECOVERY_REISSUE) &&
> 		!(info.flags & UBLK_F_USER_RECOVERY)) {
> 		...
> 	}

Will do. But note that in patch 4 this check gets more complex so we may
change it back to a switch statement.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] ublk: refactor recovery configuration flag helpers
  2024-07-02 11:09   ` Ming Lei
@ 2024-09-17  0:26     ` Uday Shankar
  0 siblings, 0 replies; 19+ messages in thread
From: Uday Shankar @ 2024-09-17  0:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Tue, Jul 02, 2024 at 07:09:24PM +0800, Ming Lei wrote:
> I feel the name of ublk_nosrv_should_queue_io() is a bit misleading.
> 
> The difference between ublk_queue_can_use_recovery() and
> ublk_queue_can_use_recovery_reissue() is clear, and
> both two need to queue ios actually in case of nosrv most times
> except for this one.

Well yeah, this refactoring is all to set up for the final patch in this
series. Taken in isolation, it might appear pointless.

I'm open to suggestions for better names if you have them.

> I'd rather to not fetch ublk_device in fast io path since ublk is MQ
> device, and only the queue structure should be touched in fast io path,
> but it is fine to check device in any slow path.

Added a helper which checks the queue-local copy of the device flags,
and used it in ublk_queue_rq in v2. Sorry for the delay; please take a
look.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] ublk: support device recovery without I/O queueing
  2024-07-02 13:46   ` Ming Lei
@ 2024-09-17  0:29     ` Uday Shankar
  0 siblings, 0 replies; 19+ messages in thread
From: Uday Shankar @ 2024-09-17  0:29 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Tue, Jul 02, 2024 at 09:46:00PM +0800, Ming Lei wrote:
> I'd suggest to add one per-ublk-queue flag for this purpose instead of
> device state, then fetching device can be avoided in fast io path, please see
> similar example of `->force_abort`.

Done in v2.

> >  	/* fill iod to slot in io cmd buffer */
> >  	res = ublk_setup_iod(ubq, rq);
> >  	if (unlikely(res != BLK_STS_OK))
> > @@ -1602,7 +1616,15 @@ static void ublk_nosrv_work(struct work_struct *work)
> >  	mutex_lock(&ub->mutex);
> >  	if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> >  		goto unlock;
> > -	__ublk_quiesce_dev(ub);
> > +
> > +	if (ublk_nosrv_should_queue_io(ub)) {
> 
> Here ublk_nosrv_should_queue_io() doesn't cover UBLK_F_USER_RECOVERY_REISSUE.

Not sure what you mean here. I don't see an issue, can you explain?

> 
> > +		__ublk_quiesce_dev(ub);
> > +	} else {
> > +		blk_mq_quiesce_queue(ub->ub_disk->queue);
> > +		ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
> > +		blk_mq_unquiesce_queue(ub->ub_disk->queue);
> > +	}
> 
> If the above extra device state is saved, the whole change can be simplified,
> and __ublk_quiesce_dev() still can be called for
> UBLK_F_USER_RECOVERY_NOQUEUE, and UBLK_S_DEV_QUIESCED can cover this new flag,
> meantime setting one per-ublk-queue flag, such as, ->fail_io_in_recovery.

I don't think it's a good idea to have the state UBLK_S_DEV_QUIESCED
cover the new flag. Then we will have a case where the state is
UBLK_S_DEV_QUIESCED but the queue is not actually quiesced... that just
seems confusing. I added a per-queue flag and used it in the fast path,
but kept the new state as well.

> > +/*
> > + * - Block devices are recoverable if ublk server exits and restarts
> > + * - Outstanding I/O when ublk server exits is met with errors
> > + * - I/O issued while there is no ublk server is met with errors
> > + */
> > +#define UBLK_F_USER_RECOVERY_NOQUEUE (1ULL << 9)
> 
> Maybe UBLK_F_USER_RECOVERY_FAIL_IO is more readable?

Sure, changed the name.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-09-17  0:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 19:44 [PATCH 0/4] ublk: support device recovery without I/O queueing Uday Shankar
2024-06-17 19:44 ` [PATCH 1/4] ublk: check recovery flags for validity Uday Shankar
2024-06-18  2:00   ` Ming Lei
2024-07-23 19:32     ` Uday Shankar
2024-06-17 19:44 ` [PATCH 2/4] ublk: refactor recovery configuration flag helpers Uday Shankar
2024-06-18  2:11   ` Ming Lei
2024-06-26 17:22     ` Uday Shankar
2024-06-27  1:17       ` Ming Lei
2024-06-27 17:09         ` Uday Shankar
2024-06-30 13:56           ` Ming Lei
2024-07-01 21:02             ` Uday Shankar
2024-07-02  4:07               ` Ming Lei
2024-07-02 11:09   ` Ming Lei
2024-09-17  0:26     ` Uday Shankar
2024-06-17 19:44 ` [PATCH 3/4] ublk: merge stop_work and quiesce_work Uday Shankar
2024-07-02 13:31   ` Ming Lei
2024-06-17 19:44 ` [PATCH 4/4] ublk: support device recovery without I/O queueing Uday Shankar
2024-07-02 13:46   ` Ming Lei
2024-09-17  0:29     ` Uday Shankar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).