* [PATCH v3 0/5] ublk: support device recovery without I/O queueing
@ 2024-10-02 22:09 Uday Shankar
2024-10-02 22:09 ` [PATCH v3 1/5] ublk: check recovery flags for validity Uday Shankar
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
ublk currently supports the following behaviors on ublk server exit:
A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue
and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:
1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands
The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:
default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
A + 2 is a currently unsupported behavior. This patch series aims to add
support for it.
Userspace support and testing for this flag are available at:
https://github.com/ublk-org/ublksrv/pull/73
Uday Shankar (5):
ublk: check recovery flags for validity
ublk: refactor recovery configuration flag helpers
ublk: merge stop_work and quiesce_work
ublk: support device recovery without I/O queueing
Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO
Documentation/block/ublk.rst | 24 +++--
drivers/block/ublk_drv.c | 191 +++++++++++++++++++++++-----------
include/uapi/linux/ublk_cmd.h | 18 ++++
3 files changed, 165 insertions(+), 68 deletions(-)
base-commit: 52d980df51c607867e40e11eef125cb51f8769a5
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/5] ublk: check recovery flags for validity
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
@ 2024-10-02 22:09 ` Uday Shankar
2024-10-02 22:09 ` [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers Uday Shankar
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
Setting UBLK_F_USER_RECOVERY_REISSUE without also setting
UBLK_F_USER_RECOVERY is currently silently equivalent to not setting any
recovery flags at all, even though that's obviously not intended. Check
for this case and fail add_dev (with a paranoid warning to aid debugging
any program which might rely on the old behavior) with EINVAL if it is
detected.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index a6c8e5cc6051..318a4dfe8266 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -62,6 +62,9 @@
| UBLK_F_USER_COPY \
| UBLK_F_ZONED)
+#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
+ | UBLK_F_USER_RECOVERY_REISSUE)
+
/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL \
(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
@@ -2372,6 +2375,14 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
else if (!(info.flags & UBLK_F_UNPRIVILEGED_DEV))
return -EPERM;
+ /* forbid nonsense combinations of recovery flags */
+ if ((info.flags & UBLK_F_USER_RECOVERY_REISSUE) &&
+ !(info.flags & UBLK_F_USER_RECOVERY)) {
+ pr_warn("%s: invalid recovery flags %llx\n", __func__,
+ info.flags & UBLK_F_ALL_RECOVERY_FLAGS);
+ return -EINVAL;
+ }
+
/*
* unprivileged device can't be trusted, but RECOVERY and
* RECOVERY_REISSUE still may hang error handling, so can't
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
2024-10-02 22:09 ` [PATCH v3 1/5] ublk: check recovery flags for validity Uday Shankar
@ 2024-10-02 22:09 ` Uday Shankar
2024-10-04 15:32 ` Ming Lei
2024-10-02 22:09 ` [PATCH v3 3/5] ublk: merge stop_work and quiesce_work Uday Shankar
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
ublk currently supports the following behaviors on ublk server exit:
A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue
and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:
1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands
The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:
default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
We can't easily change the userspace interface to allow independent
selection of one of {A, B, C} and one of {1, 2}, but we can refactor the
internal helpers which test for the flags. Replace the existing helpers
with the following set:
ublk_nosrv_should_reissue_outstanding: tests for behavior C
ublk_nosrv_[dev_]should_queue_io: tests for behavior B
ublk_nosrv_should_stop_dev: tests for behavior 1
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
Changes since v2 (https://lore.kernel.org/linux-block/20240917002155.2044225-3-ushankar@purestorage.com/):
- Remove redundant test from ublk_nosrv_should_stop_dev
drivers/block/ublk_drv.c | 62 +++++++++++++++++++++++++++-------------
1 file changed, 42 insertions(+), 20 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 318a4dfe8266..a838ba809445 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -675,22 +675,44 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
PAGE_SIZE);
}
-static inline bool ublk_queue_can_use_recovery_reissue(
- struct ublk_queue *ubq)
+/*
+ * Should I/O outstanding to the ublk server when it exits be reissued?
+ * If not, outstanding I/O will get errors.
+ */
+static inline bool ublk_nosrv_should_reissue_outstanding(struct ublk_device *ub)
{
- return (ubq->flags & UBLK_F_USER_RECOVERY) &&
- (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE);
+ return (ub->dev_info.flags & UBLK_F_USER_RECOVERY) &&
+ (ub->dev_info.flags & UBLK_F_USER_RECOVERY_REISSUE);
}
-static inline bool ublk_queue_can_use_recovery(
- struct ublk_queue *ubq)
+/*
+ * Should I/O issued while there is no ublk server queue? If not, I/O
+ * issued while there is no ublk server will get errors.
+ */
+static inline bool ublk_nosrv_dev_should_queue_io(struct ublk_device *ub)
+{
+ return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
+}
+
+/*
+ * Same as ublk_nosrv_dev_should_queue_io, but uses a queue-local copy
+ * of the device flags for smaller cache footprint - better for fast
+ * paths.
+ */
+static inline bool ublk_nosrv_should_queue_io(struct ublk_queue *ubq)
{
return ubq->flags & UBLK_F_USER_RECOVERY;
}
-static inline bool ublk_can_use_recovery(struct ublk_device *ub)
+/*
+ * Should ublk devices be stopped (i.e. no recovery possible) when the
+ * ublk server exits? If not, devices can be used again by a future
+ * incarnation of a ublk server via the start_recovery/end_recovery
+ * commands.
+ */
+static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
{
- return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
+ return !(ub->dev_info.flags & UBLK_F_USER_RECOVERY);
}
static void ublk_free_disk(struct gendisk *disk)
@@ -1066,7 +1088,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
{
WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
- if (ublk_queue_can_use_recovery_reissue(ubq))
+ if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
blk_mq_requeue_request(req, false);
else
ublk_put_req_ref(ubq, req);
@@ -1094,7 +1116,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
struct request *rq)
{
/* We cannot process this rq so just requeue it. */
- if (ublk_queue_can_use_recovery(ubq))
+ if (ublk_nosrv_dev_should_queue_io(ubq->dev))
blk_mq_requeue_request(rq, false);
else
blk_mq_end_request(rq, BLK_STS_IOERR);
@@ -1239,10 +1261,10 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
struct ublk_device *ub = ubq->dev;
if (ublk_abort_requests(ub, ubq)) {
- if (ublk_can_use_recovery(ub))
- schedule_work(&ub->quiesce_work);
- else
+ if (ublk_nosrv_should_stop_dev(ub))
schedule_work(&ub->stop_work);
+ else
+ schedule_work(&ub->quiesce_work);
}
return BLK_EH_DONE;
}
@@ -1271,7 +1293,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
* Note: force_abort is guaranteed to be seen because it is set
* before request queue is unqiuesced.
*/
- if (ublk_queue_can_use_recovery(ubq) && unlikely(ubq->force_abort))
+ if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
return BLK_STS_IOERR;
if (unlikely(ubq->canceling)) {
@@ -1492,10 +1514,10 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
ublk_cancel_cmd(ubq, io, issue_flags);
if (need_schedule) {
- if (ublk_can_use_recovery(ub))
- schedule_work(&ub->quiesce_work);
- else
+ if (ublk_nosrv_should_stop_dev(ub))
schedule_work(&ub->stop_work);
+ else
+ schedule_work(&ub->quiesce_work);
}
}
@@ -1600,7 +1622,7 @@ static void ublk_stop_dev(struct ublk_device *ub)
mutex_lock(&ub->mutex);
if (ub->dev_info.state == UBLK_S_DEV_DEAD)
goto unlock;
- if (ublk_can_use_recovery(ub)) {
+ if (ublk_nosrv_dev_should_queue_io(ub)) {
if (ub->dev_info.state == UBLK_S_DEV_LIVE)
__ublk_quiesce_dev(ub);
ublk_unquiesce_dev(ub);
@@ -2701,7 +2723,7 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
int i;
mutex_lock(&ub->mutex);
- if (!ublk_can_use_recovery(ub))
+ if (ublk_nosrv_should_stop_dev(ub))
goto out_unlock;
if (!ub->nr_queues_ready)
goto out_unlock;
@@ -2754,7 +2776,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
mutex_lock(&ub->mutex);
- if (!ublk_can_use_recovery(ub))
+ if (ublk_nosrv_should_stop_dev(ub))
goto out_unlock;
if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 3/5] ublk: merge stop_work and quiesce_work
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
2024-10-02 22:09 ` [PATCH v3 1/5] ublk: check recovery flags for validity Uday Shankar
2024-10-02 22:09 ` [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers Uday Shankar
@ 2024-10-02 22:09 ` Uday Shankar
2024-10-02 22:09 ` [PATCH v3 4/5] ublk: support device recovery without I/O queueing Uday Shankar
2024-10-02 22:09 ` [PATCH v3 5/5] Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO Uday Shankar
4 siblings, 0 replies; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
Save some lines by merging stop_work and quiesce_work into nosrv_work,
which looks at the recovery flags and does the right thing when the "no
ublk server" condition is detected.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 64 ++++++++++++++++------------------------
1 file changed, 25 insertions(+), 39 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index a838ba809445..d5edef7bde43 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -182,8 +182,7 @@ struct ublk_device {
unsigned int nr_queues_ready;
unsigned int nr_privileged_daemon;
- struct work_struct quiesce_work;
- struct work_struct stop_work;
+ struct work_struct nosrv_work;
};
/* header of ublk_params */
@@ -1261,10 +1260,7 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
struct ublk_device *ub = ubq->dev;
if (ublk_abort_requests(ub, ubq)) {
- if (ublk_nosrv_should_stop_dev(ub))
- schedule_work(&ub->stop_work);
- else
- schedule_work(&ub->quiesce_work);
+ schedule_work(&ub->nosrv_work);
}
return BLK_EH_DONE;
}
@@ -1514,10 +1510,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
ublk_cancel_cmd(ubq, io, issue_flags);
if (need_schedule) {
- if (ublk_nosrv_should_stop_dev(ub))
- schedule_work(&ub->stop_work);
- else
- schedule_work(&ub->quiesce_work);
+ schedule_work(&ub->nosrv_work);
}
}
@@ -1580,20 +1573,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub)
ub->dev_info.state = UBLK_S_DEV_QUIESCED;
}
-static void ublk_quiesce_work_fn(struct work_struct *work)
-{
- struct ublk_device *ub =
- container_of(work, struct ublk_device, quiesce_work);
-
- mutex_lock(&ub->mutex);
- if (ub->dev_info.state != UBLK_S_DEV_LIVE)
- goto unlock;
- __ublk_quiesce_dev(ub);
- unlock:
- mutex_unlock(&ub->mutex);
- ublk_cancel_dev(ub);
-}
-
static void ublk_unquiesce_dev(struct ublk_device *ub)
{
int i;
@@ -1642,6 +1621,25 @@ static void ublk_stop_dev(struct ublk_device *ub)
ublk_cancel_dev(ub);
}
+static void ublk_nosrv_work(struct work_struct *work)
+{
+ struct ublk_device *ub =
+ container_of(work, struct ublk_device, nosrv_work);
+
+ if (ublk_nosrv_should_stop_dev(ub)) {
+ ublk_stop_dev(ub);
+ return;
+ }
+
+ mutex_lock(&ub->mutex);
+ if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ goto unlock;
+ __ublk_quiesce_dev(ub);
+ unlock:
+ mutex_unlock(&ub->mutex);
+ ublk_cancel_dev(ub);
+}
+
/* device can only be started after all IOs are ready */
static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
{
@@ -2155,14 +2153,6 @@ static int ublk_add_chdev(struct ublk_device *ub)
return ret;
}
-static void ublk_stop_work_fn(struct work_struct *work)
-{
- struct ublk_device *ub =
- container_of(work, struct ublk_device, stop_work);
-
- ublk_stop_dev(ub);
-}
-
/* align max io buffer size with PAGE_SIZE */
static void ublk_align_max_io_size(struct ublk_device *ub)
{
@@ -2187,8 +2177,7 @@ static int ublk_add_tag_set(struct ublk_device *ub)
static void ublk_remove(struct ublk_device *ub)
{
ublk_stop_dev(ub);
- cancel_work_sync(&ub->stop_work);
- cancel_work_sync(&ub->quiesce_work);
+ cancel_work_sync(&ub->nosrv_work);
cdev_device_del(&ub->cdev, &ub->cdev_dev);
ublk_put_device(ub);
ublks_added--;
@@ -2448,8 +2437,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
goto out_unlock;
mutex_init(&ub->mutex);
spin_lock_init(&ub->lock);
- INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
- INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
+ INIT_WORK(&ub->nosrv_work, ublk_nosrv_work);
ret = ublk_alloc_dev_number(ub, header->dev_id);
if (ret < 0)
@@ -2584,9 +2572,7 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd)
static int ublk_ctrl_stop_dev(struct ublk_device *ub)
{
ublk_stop_dev(ub);
- cancel_work_sync(&ub->stop_work);
- cancel_work_sync(&ub->quiesce_work);
-
+ cancel_work_sync(&ub->nosrv_work);
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 4/5] ublk: support device recovery without I/O queueing
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
` (2 preceding siblings ...)
2024-10-02 22:09 ` [PATCH v3 3/5] ublk: merge stop_work and quiesce_work Uday Shankar
@ 2024-10-02 22:09 ` Uday Shankar
2024-10-04 15:34 ` Ming Lei
2024-10-02 22:09 ` [PATCH v3 5/5] Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO Uday Shankar
4 siblings, 1 reply; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
ublk currently supports the following behaviors on ublk server exit:
A: outstanding I/Os get errors, subsequently issued I/Os get errors
B: outstanding I/Os get errors, subsequently issued I/Os queue
C: outstanding I/Os get reissued, subsequently issued I/Os queue
and the following behaviors for recovery of preexisting block devices by
a future incarnation of the ublk server:
1: ublk devices stopped on ublk server exit (no recovery possible)
2: ublk devices are recoverable using start/end_recovery commands
The userspace interface allows selection of combinations of these
behaviors using flags specified at device creation time, namely:
default behavior: A + 1
UBLK_F_USER_RECOVERY: B + 2
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
The behavior A + 2 is currently unsupported. Add support for this
behavior under the new flag combination
UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_FAIL_IO.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
Changes since v2 (https://lore.kernel.org/linux-block/20240917002155.2044225-5-ushankar@purestorage.com/):
- Clean up logic in ublk_ctrl_end_recovery
drivers/block/ublk_drv.c | 78 ++++++++++++++++++++++++++++-------
include/uapi/linux/ublk_cmd.h | 18 ++++++++
2 files changed, 81 insertions(+), 15 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index d5edef7bde43..f2a05dcbc58b 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -60,10 +60,12 @@
| UBLK_F_UNPRIVILEGED_DEV \
| UBLK_F_CMD_IOCTL_ENCODE \
| UBLK_F_USER_COPY \
- | UBLK_F_ZONED)
+ | UBLK_F_ZONED \
+ | UBLK_F_USER_RECOVERY_FAIL_IO)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
- | UBLK_F_USER_RECOVERY_REISSUE)
+ | UBLK_F_USER_RECOVERY_REISSUE \
+ | UBLK_F_USER_RECOVERY_FAIL_IO)
/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL \
@@ -146,6 +148,7 @@ struct ublk_queue {
bool force_abort;
bool timeout;
bool canceling;
+ bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
unsigned short nr_io_ready; /* how many ios setup */
spinlock_t cancel_lock;
struct ublk_device *dev;
@@ -690,7 +693,8 @@ static inline bool ublk_nosrv_should_reissue_outstanding(struct ublk_device *ub)
*/
static inline bool ublk_nosrv_dev_should_queue_io(struct ublk_device *ub)
{
- return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
+ return (ub->dev_info.flags & UBLK_F_USER_RECOVERY) &&
+ !(ub->dev_info.flags & UBLK_F_USER_RECOVERY_FAIL_IO);
}
/*
@@ -700,7 +704,8 @@ static inline bool ublk_nosrv_dev_should_queue_io(struct ublk_device *ub)
*/
static inline bool ublk_nosrv_should_queue_io(struct ublk_queue *ubq)
{
- return ubq->flags & UBLK_F_USER_RECOVERY;
+ return (ubq->flags & UBLK_F_USER_RECOVERY) &&
+ !(ubq->flags & UBLK_F_USER_RECOVERY_FAIL_IO);
}
/*
@@ -714,6 +719,12 @@ static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
return !(ub->dev_info.flags & UBLK_F_USER_RECOVERY);
}
+static inline bool ublk_dev_in_recoverable_state(struct ublk_device *ub)
+{
+ return ub->dev_info.state == UBLK_S_DEV_QUIESCED ||
+ ub->dev_info.state == UBLK_S_DEV_FAIL_IO;
+}
+
static void ublk_free_disk(struct gendisk *disk)
{
struct ublk_device *ub = disk->private_data;
@@ -1275,6 +1286,10 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
struct request *rq = bd->rq;
blk_status_t res;
+ if (unlikely(ubq->fail_io)) {
+ return BLK_STS_TARGET;
+ }
+
/* fill iod to slot in io cmd buffer */
res = ublk_setup_iod(ubq, rq);
if (unlikely(res != BLK_STS_OK))
@@ -1625,6 +1640,7 @@ static void ublk_nosrv_work(struct work_struct *work)
{
struct ublk_device *ub =
container_of(work, struct ublk_device, nosrv_work);
+ int i;
if (ublk_nosrv_should_stop_dev(ub)) {
ublk_stop_dev(ub);
@@ -1634,7 +1650,18 @@ static void ublk_nosrv_work(struct work_struct *work)
mutex_lock(&ub->mutex);
if (ub->dev_info.state != UBLK_S_DEV_LIVE)
goto unlock;
- __ublk_quiesce_dev(ub);
+
+ if (ublk_nosrv_dev_should_queue_io(ub)) {
+ __ublk_quiesce_dev(ub);
+ } else {
+ blk_mq_quiesce_queue(ub->ub_disk->queue);
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ ublk_get_queue(ub, i)->fail_io = true;
+ }
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
+ }
+
unlock:
mutex_unlock(&ub->mutex);
ublk_cancel_dev(ub);
@@ -2387,8 +2414,13 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
return -EPERM;
/* forbid nonsense combinations of recovery flags */
- if ((info.flags & UBLK_F_USER_RECOVERY_REISSUE) &&
- !(info.flags & UBLK_F_USER_RECOVERY)) {
+ switch (info.flags & UBLK_F_ALL_RECOVERY_FLAGS) {
+ case 0:
+ case UBLK_F_USER_RECOVERY:
+ case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE):
+ case (UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_FAIL_IO):
+ break;
+ default:
pr_warn("%s: invalid recovery flags %llx\n", __func__,
info.flags & UBLK_F_ALL_RECOVERY_FLAGS);
return -EINVAL;
@@ -2720,14 +2752,18 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
* and related io_uring ctx is freed so file struct of /dev/ublkcX is
* released.
*
+ * and one of the following holds
+ *
* (2) UBLK_S_DEV_QUIESCED is set, which means the quiesce_work:
* (a)has quiesced request queue
* (b)has requeued every inflight rqs whose io_flags is ACTIVE
* (c)has requeued/aborted every inflight rqs whose io_flags is NOT ACTIVE
* (d)has completed/camceled all ioucmds owned by ther dying process
+ *
+ * (3) UBLK_S_DEV_FAIL_IO is set, which means the queue is not
+ * quiesced, but all I/O is being immediately errored
*/
- if (test_bit(UB_STATE_OPEN, &ub->state) ||
- ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+ if (test_bit(UB_STATE_OPEN, &ub->state) || !ublk_dev_in_recoverable_state(ub)) {
ret = -EBUSY;
goto out_unlock;
}
@@ -2751,6 +2787,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
int ublksrv_pid = (int)header->data[0];
int ret = -EINVAL;
+ int i;
pr_devel("%s: Waiting for new ubq_daemons(nr: %d) are ready, dev id %d...\n",
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
@@ -2765,18 +2802,29 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
if (ublk_nosrv_should_stop_dev(ub))
goto out_unlock;
- if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+ if (!ublk_dev_in_recoverable_state(ub)) {
ret = -EBUSY;
goto out_unlock;
}
ub->dev_info.ublksrv_pid = ublksrv_pid;
pr_devel("%s: new ublksrv_pid %d, dev id %d\n",
__func__, ublksrv_pid, header->dev_id);
- blk_mq_unquiesce_queue(ub->ub_disk->queue);
- pr_devel("%s: queue unquiesced, dev id %d.\n",
- __func__, header->dev_id);
- blk_mq_kick_requeue_list(ub->ub_disk->queue);
- ub->dev_info.state = UBLK_S_DEV_LIVE;
+
+ if (ublk_nosrv_dev_should_queue_io(ub)) {
+ ub->dev_info.state = UBLK_S_DEV_LIVE;
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ pr_devel("%s: queue unquiesced, dev id %d.\n",
+ __func__, header->dev_id);
+ blk_mq_kick_requeue_list(ub->ub_disk->queue);
+ } else {
+ blk_mq_quiesce_queue(ub->ub_disk->queue);
+ ub->dev_info.state = UBLK_S_DEV_LIVE;
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ ublk_get_queue(ub, i)->fail_io = false;
+ }
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ }
+
ret = 0;
out_unlock:
mutex_unlock(&ub->mutex);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index c8dc5f8ea699..a2b3ea344639 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -147,8 +147,18 @@
*/
#define UBLK_F_NEED_GET_DATA (1UL << 2)
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is met with errors
+ * - I/O issued while there is no ublk server queues
+ */
#define UBLK_F_USER_RECOVERY (1UL << 3)
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is reissued
+ * - I/O issued while there is no ublk server queues
+ */
#define UBLK_F_USER_RECOVERY_REISSUE (1UL << 4)
/*
@@ -184,10 +194,18 @@
*/
#define UBLK_F_ZONED (1ULL << 8)
+/*
+ * - Block devices are recoverable if ublk server exits and restarts
+ * - Outstanding I/O when ublk server exits is met with errors
+ * - I/O issued while there is no ublk server is met with errors
+ */
+#define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
#define UBLK_S_DEV_QUIESCED 2
+#define UBLK_S_DEV_FAIL_IO 3
/* shipped via sqe->cmd of io_uring command */
struct ublksrv_ctrl_cmd {
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 5/5] Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
` (3 preceding siblings ...)
2024-10-02 22:09 ` [PATCH v3 4/5] ublk: support device recovery without I/O queueing Uday Shankar
@ 2024-10-02 22:09 ` Uday Shankar
4 siblings, 0 replies; 8+ messages in thread
From: Uday Shankar @ 2024-10-02 22:09 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Jonathan Corbet
Cc: Uday Shankar, linux-block, linux-doc
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
New in v3
Documentation/block/ublk.rst | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index ff74b3ec4a98..51665a3e6a50 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -199,24 +199,36 @@ managing and controlling ublk devices with help of several control commands:
- user recovery feature description
- Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
- ``UBLK_F_USER_RECOVERY_REISSUE``.
-
- With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
+ Three new features are added for user recovery: ``UBLK_F_USER_RECOVERY``,
+ ``UBLK_F_USER_RECOVERY_REISSUE``, and ``UBLK_F_USER_RECOVERY_FAIL_IO``. To
+ enable recovery of ublk devices after the ublk server exits, the ublk server
+ should specify the ``UBLK_F_USER_RECOVERY`` flag when creating the device. The
+ ublk server may additionally specify at most one of
+ ``UBLK_F_USER_RECOVERY_REISSUE`` and ``UBLK_F_USER_RECOVERY_FAIL_IO`` to
+ modify how I/O is handled while the ublk server is dying/dead (this is called
+ the ``nosrv`` case in the driver code).
+
+ With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
recovery stage and ublk device ID is kept. It is ublk server's
responsibility to recover the device context by its own knowledge.
Requests which have not been issued to userspace are requeued. Requests
which have been issued to userspace are aborted.
- With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk
- server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
+ With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
+ (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
requests which have been issued to userspace are requeued and will be
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
double-write since the driver may issue the same I/O request twice. It
might be useful to a read-only FS or a VM backend.
+ With ``UBLK_F_USER_RECOVERY_FAIL_IO`` additionally set, after the ublk server
+ exits, requests which have issued to userspace are failed, as are any
+ subsequently issued requests. Applications continuously issuing I/O against
+ devices with this flag set will see a stream of I/O errors until a new ublk
+ server recovers the device.
+
Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
Once the flag is set, all control commands can be sent by unprivileged
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers
2024-10-02 22:09 ` [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers Uday Shankar
@ 2024-10-04 15:32 ` Ming Lei
0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2024-10-04 15:32 UTC (permalink / raw)
To: Uday Shankar; +Cc: Jens Axboe, Jonathan Corbet, linux-block, linux-doc
On Wed, Oct 02, 2024 at 04:09:46PM -0600, Uday Shankar wrote:
> ublk currently supports the following behaviors on ublk server exit:
>
> A: outstanding I/Os get errors, subsequently issued I/Os get errors
> B: outstanding I/Os get errors, subsequently issued I/Os queue
> C: outstanding I/Os get reissued, subsequently issued I/Os queue
>
> and the following behaviors for recovery of preexisting block devices by
> a future incarnation of the ublk server:
>
> 1: ublk devices stopped on ublk server exit (no recovery possible)
> 2: ublk devices are recoverable using start/end_recovery commands
>
> The userspace interface allows selection of combinations of these
> behaviors using flags specified at device creation time, namely:
>
> default behavior: A + 1
> UBLK_F_USER_RECOVERY: B + 2
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
>
> We can't easily change the userspace interface to allow independent
> selection of one of {A, B, C} and one of {1, 2}, but we can refactor the
> internal helpers which test for the flags. Replace the existing helpers
> with the following set:
>
> ublk_nosrv_should_reissue_outstanding: tests for behavior C
> ublk_nosrv_[dev_]should_queue_io: tests for behavior B
> ublk_nosrv_should_stop_dev: tests for behavior 1
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Thanks,
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 4/5] ublk: support device recovery without I/O queueing
2024-10-02 22:09 ` [PATCH v3 4/5] ublk: support device recovery without I/O queueing Uday Shankar
@ 2024-10-04 15:34 ` Ming Lei
0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2024-10-04 15:34 UTC (permalink / raw)
To: Uday Shankar; +Cc: Jens Axboe, Jonathan Corbet, linux-block, linux-doc
On Wed, Oct 02, 2024 at 04:09:48PM -0600, Uday Shankar wrote:
> ublk currently supports the following behaviors on ublk server exit:
>
> A: outstanding I/Os get errors, subsequently issued I/Os get errors
> B: outstanding I/Os get errors, subsequently issued I/Os queue
> C: outstanding I/Os get reissued, subsequently issued I/Os queue
>
> and the following behaviors for recovery of preexisting block devices by
> a future incarnation of the ublk server:
>
> 1: ublk devices stopped on ublk server exit (no recovery possible)
> 2: ublk devices are recoverable using start/end_recovery commands
>
> The userspace interface allows selection of combinations of these
> behaviors using flags specified at device creation time, namely:
>
> default behavior: A + 1
> UBLK_F_USER_RECOVERY: B + 2
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2
>
> The behavior A + 2 is currently unsupported. Add support for this
> behavior under the new flag combination
> UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_FAIL_IO.
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
> Changes since v2 (https://lore.kernel.org/linux-block/20240917002155.2044225-5-ushankar@purestorage.com/):
> - Clean up logic in ublk_ctrl_end_recovery
>
> drivers/block/ublk_drv.c | 78 ++++++++++++++++++++++++++++-------
> include/uapi/linux/ublk_cmd.h | 18 ++++++++
> 2 files changed, 81 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index d5edef7bde43..f2a05dcbc58b 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -60,10 +60,12 @@
> | UBLK_F_UNPRIVILEGED_DEV \
> | UBLK_F_CMD_IOCTL_ENCODE \
> | UBLK_F_USER_COPY \
> - | UBLK_F_ZONED)
> + | UBLK_F_ZONED \
> + | UBLK_F_USER_RECOVERY_FAIL_IO)
>
> #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> - | UBLK_F_USER_RECOVERY_REISSUE)
> + | UBLK_F_USER_RECOVERY_REISSUE \
> + | UBLK_F_USER_RECOVERY_FAIL_IO)
>
> /* All UBLK_PARAM_TYPE_* should be included here */
> #define UBLK_PARAM_TYPE_ALL \
> @@ -146,6 +148,7 @@ struct ublk_queue {
> bool force_abort;
> bool timeout;
> bool canceling;
> + bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
> unsigned short nr_io_ready; /* how many ios setup */
> spinlock_t cancel_lock;
> struct ublk_device *dev;
> @@ -690,7 +693,8 @@ static inline bool ublk_nosrv_should_reissue_outstanding(struct ublk_device *ub)
> */
> static inline bool ublk_nosrv_dev_should_queue_io(struct ublk_device *ub)
> {
> - return ub->dev_info.flags & UBLK_F_USER_RECOVERY;
> + return (ub->dev_info.flags & UBLK_F_USER_RECOVERY) &&
> + !(ub->dev_info.flags & UBLK_F_USER_RECOVERY_FAIL_IO);
> }
>
> /*
> @@ -700,7 +704,8 @@ static inline bool ublk_nosrv_dev_should_queue_io(struct ublk_device *ub)
> */
> static inline bool ublk_nosrv_should_queue_io(struct ublk_queue *ubq)
> {
> - return ubq->flags & UBLK_F_USER_RECOVERY;
> + return (ubq->flags & UBLK_F_USER_RECOVERY) &&
> + !(ubq->flags & UBLK_F_USER_RECOVERY_FAIL_IO);
> }
>
> /*
> @@ -714,6 +719,12 @@ static inline bool ublk_nosrv_should_stop_dev(struct ublk_device *ub)
> return !(ub->dev_info.flags & UBLK_F_USER_RECOVERY);
> }
>
> +static inline bool ublk_dev_in_recoverable_state(struct ublk_device *ub)
> +{
> + return ub->dev_info.state == UBLK_S_DEV_QUIESCED ||
> + ub->dev_info.state == UBLK_S_DEV_FAIL_IO;
> +}
> +
> static void ublk_free_disk(struct gendisk *disk)
> {
> struct ublk_device *ub = disk->private_data;
> @@ -1275,6 +1286,10 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> struct request *rq = bd->rq;
> blk_status_t res;
>
> + if (unlikely(ubq->fail_io)) {
> + return BLK_STS_TARGET;
> + }
> +
> /* fill iod to slot in io cmd buffer */
> res = ublk_setup_iod(ubq, rq);
> if (unlikely(res != BLK_STS_OK))
> @@ -1625,6 +1640,7 @@ static void ublk_nosrv_work(struct work_struct *work)
> {
> struct ublk_device *ub =
> container_of(work, struct ublk_device, nosrv_work);
> + int i;
>
> if (ublk_nosrv_should_stop_dev(ub)) {
> ublk_stop_dev(ub);
> @@ -1634,7 +1650,18 @@ static void ublk_nosrv_work(struct work_struct *work)
> mutex_lock(&ub->mutex);
> if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> goto unlock;
> - __ublk_quiesce_dev(ub);
> +
> + if (ublk_nosrv_dev_should_queue_io(ub)) {
> + __ublk_quiesce_dev(ub);
> + } else {
> + blk_mq_quiesce_queue(ub->ub_disk->queue);
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + ublk_get_queue(ub, i)->fail_io = true;
> + }
> + blk_mq_unquiesce_queue(ub->ub_disk->queue);
> + ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
The above state update should be moved before blk_mq_unquiesce_queue().
Otherwise, this patch is fine.
Thanks,
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-04 15:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 22:09 [PATCH v3 0/5] ublk: support device recovery without I/O queueing Uday Shankar
2024-10-02 22:09 ` [PATCH v3 1/5] ublk: check recovery flags for validity Uday Shankar
2024-10-02 22:09 ` [PATCH v3 2/5] ublk: refactor recovery configuration flag helpers Uday Shankar
2024-10-04 15:32 ` Ming Lei
2024-10-02 22:09 ` [PATCH v3 3/5] ublk: merge stop_work and quiesce_work Uday Shankar
2024-10-02 22:09 ` [PATCH v3 4/5] ublk: support device recovery without I/O queueing Uday Shankar
2024-10-04 15:34 ` Ming Lei
2024-10-02 22:09 ` [PATCH v3 5/5] Documentation: ublk: document UBLK_F_USER_RECOVERY_FAIL_IO Uday Shankar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).