* [PATCH 0/3] ublk: add UBLK_F_QUIESCE
@ 2025-05-22 16:35 Ming Lei
2025-05-22 16:35 ` [PATCH 1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE Ming Lei
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Ming Lei @ 2025-05-22 16:35 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Yoav Cohen, Ming Lei
Hi Jens,
The 1st patch adds test case for covering UBLK_U_CMD_UPDATE_SIZE which is
added recently.
The 2nd patch adds UBLK_F_QUIESCE for supporting to quiesce device in grace
way, and typical use case is to upgrade ublk server, meantime keep ublk
block device online.
The last patch adds test case for UBLK_F_QUIESCE.
Thanks,
Ming
Ming Lei (3):
selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
ublk: add feature UBLK_F_QUIESCE
selftests: ublk: add test for UBLK_F_QUIESCE
drivers/block/ublk_drv.c | 124 +++++++++++++++++-
include/uapi/linux/ublk_cmd.h | 19 +++
tools/testing/selftests/ublk/Makefile | 2 +
tools/testing/selftests/ublk/kublk.c | 94 ++++++++++++-
tools/testing/selftests/ublk/kublk.h | 3 +
tools/testing/selftests/ublk/test_common.sh | 37 +++++-
.../testing/selftests/ublk/test_generic_04.sh | 2 +-
.../testing/selftests/ublk/test_generic_05.sh | 2 +-
.../testing/selftests/ublk/test_generic_10.sh | 30 +++++
.../testing/selftests/ublk/test_generic_11.sh | 44 +++++++
10 files changed, 350 insertions(+), 7 deletions(-)
create mode 100755 tools/testing/selftests/ublk/test_generic_10.sh
create mode 100755 tools/testing/selftests/ublk/test_generic_11.sh
--
2.47.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
2025-05-22 16:35 [PATCH 0/3] ublk: add UBLK_F_QUIESCE Ming Lei
@ 2025-05-22 16:35 ` Ming Lei
2025-05-22 16:35 ` [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE Ming Lei
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-05-22 16:35 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Yoav Cohen, Ming Lei,
Omri Mann, Jared Holzman
Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE.
Add 'update_size -s|--size size_in_bytes' sub-command on ublk utility for
supporting this feature, then verify the feature via generic_10.
Cc: Omri Mann <omri@nvidia.com>
Cc: Jared Holzman <jholzman@nvidia.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 55 ++++++++++++++++++-
tools/testing/selftests/ublk/kublk.h | 3 +
tools/testing/selftests/ublk/test_common.sh | 5 ++
.../testing/selftests/ublk/test_generic_10.sh | 30 ++++++++++
5 files changed, 93 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/ublk/test_generic_10.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 2a2ef0cb54bc..81b8ed6b92e2 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -17,6 +17,7 @@ TEST_PROGS += test_generic_07.sh
TEST_PROGS += test_generic_08.sh
TEST_PROGS += test_generic_09.sh
+TEST_PROGS += test_generic_10.sh
TEST_PROGS += test_null_01.sh
TEST_PROGS += test_null_02.sh
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index c429a20ab51e..683a23078f43 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -216,6 +216,18 @@ static int ublk_ctrl_get_features(struct ublk_dev *dev,
return __ublk_ctrl_cmd(dev, &data);
}
+static int ublk_ctrl_update_size(struct ublk_dev *dev,
+ __u64 nr_sects)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_UPDATE_SIZE,
+ .flags = CTRL_CMD_HAS_DATA,
+ };
+
+ data.data[0] = nr_sects;
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
static const char *ublk_dev_state_desc(struct ublk_dev *dev)
{
switch (dev->dev_info.state) {
@@ -1236,6 +1248,7 @@ static int cmd_dev_get_features(void)
[const_ilog2(UBLK_F_USER_COPY)] = "USER_COPY",
[const_ilog2(UBLK_F_ZONED)] = "ZONED",
[const_ilog2(UBLK_F_USER_RECOVERY_FAIL_IO)] = "RECOVERY_FAIL_IO",
+ [const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE",
[const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG",
};
struct ublk_dev *dev;
@@ -1270,6 +1283,39 @@ static int cmd_dev_get_features(void)
return ret;
}
+static int cmd_dev_update_size(struct dev_ctx *ctx)
+{
+ struct ublk_dev *dev = ublk_ctrl_init();
+ struct ublk_params p;
+ int ret = -EINVAL;
+
+ if (!dev)
+ return -ENODEV;
+
+ if (ctx->dev_id < 0) {
+ fprintf(stderr, "device id isn't provided\n");
+ goto out;
+ }
+
+ dev->dev_info.dev_id = ctx->dev_id;
+ ret = ublk_ctrl_get_params(dev, &p);
+ if (ret < 0) {
+ ublk_err("failed to get params %d %s\n", ret, strerror(-ret));
+ goto out;
+ }
+
+ if (ctx->size & ((1 << p.basic.logical_bs_shift) - 1)) {
+ ublk_err("size isn't aligned with logical block size\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = ublk_ctrl_update_size(dev, ctx->size >> 9);
+out:
+ ublk_ctrl_deinit(dev);
+ return ret;
+}
+
static void __cmd_create_help(char *exe, bool recovery)
{
int i;
@@ -1312,6 +1358,7 @@ static int cmd_dev_help(char *exe)
printf("%s list [-n dev_id] -a \n", exe);
printf("\t -a list all devices, -n list specified device, default -a \n\n");
printf("%s features\n", exe);
+ printf("%s update_size -n dev_id -s|--size size_in_bytes \n", exe);
return 0;
}
@@ -1333,6 +1380,7 @@ int main(int argc, char *argv[])
{ "get_data", 1, NULL, 'g'},
{ "auto_zc", 0, NULL, 0 },
{ "auto_zc_fallback", 0, NULL, 0 },
+ { "size", 1, NULL, 's'},
{ 0, 0, 0, 0 }
};
const struct ublk_tgt_ops *ops = NULL;
@@ -1354,7 +1402,7 @@ int main(int argc, char *argv[])
opterr = 0;
optind = 2;
- while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:gaz",
+ while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:s:gaz",
longopts, &option_idx)) != -1) {
switch (opt) {
case 'a':
@@ -1394,6 +1442,9 @@ int main(int argc, char *argv[])
case 'g':
ctx.flags |= UBLK_F_NEED_GET_DATA;
break;
+ case 's':
+ ctx.size = strtoull(optarg, NULL, 10);
+ break;
case 0:
if (!strcmp(longopts[option_idx].name, "debug_mask"))
ublk_dbg_mask = strtol(optarg, NULL, 16);
@@ -1470,6 +1521,8 @@ int main(int argc, char *argv[])
ret = cmd_dev_help(argv[0]);
else if (!strcmp(cmd, "features"))
ret = cmd_dev_get_features();
+ else if (!strcmp(cmd, "update_size"))
+ ret = cmd_dev_update_size(&ctx);
else
cmd_dev_help(argv[0]);
diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
index 9af930e951a3..e34508bf5798 100644
--- a/tools/testing/selftests/ublk/kublk.h
+++ b/tools/testing/selftests/ublk/kublk.h
@@ -92,6 +92,9 @@ struct dev_ctx {
/* built from shmem, only for ublk_dump_dev() */
struct ublk_dev *shadow_dev;
+ /* for 'update_size' command */
+ unsigned long long size;
+
union {
struct stripe_ctx stripe;
struct fault_inject_ctx fault_inject;
diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh
index c17fd66b73ac..244d886b7eae 100755
--- a/tools/testing/selftests/ublk/test_common.sh
+++ b/tools/testing/selftests/ublk/test_common.sh
@@ -23,6 +23,11 @@ _get_disk_dev_t() {
echo $(( (major & 0xfff) << 20 | (minor & 0xfffff) ))
}
+_get_disk_size()
+{
+ lsblk -b -o SIZE -n "$1"
+}
+
_run_fio_verify_io() {
fio --name=verify --rw=randwrite --direct=1 --ioengine=libaio \
--bs=8k --iodepth=32 --verify=crc32c --do_verify=1 \
diff --git a/tools/testing/selftests/ublk/test_generic_10.sh b/tools/testing/selftests/ublk/test_generic_10.sh
new file mode 100755
index 000000000000..abc11c3d416b
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_10.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_10"
+ERR_CODE=0
+
+if ! _have_feature "UPDATE_SIZE"; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
+_prep_test "null" "check update size"
+
+dev_id=$(_add_ublk_dev -t null)
+_check_add_dev $TID $?
+
+size=$(_get_disk_size /dev/ublkb"${dev_id}")
+size=$(( size / 2 ))
+if ! "$UBLK_PROG" update_size -n "$dev_id" -s "$size"; then
+ ERR_CODE=255
+fi
+
+new_size=$(_get_disk_size /dev/ublkb"${dev_id}")
+if [ "$new_size" != "$size" ]; then
+ ERR_CODE=255
+fi
+
+_cleanup_test "null"
+_show_result $TID $ERR_CODE
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-05-22 16:35 [PATCH 0/3] ublk: add UBLK_F_QUIESCE Ming Lei
2025-05-22 16:35 ` [PATCH 1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE Ming Lei
@ 2025-05-22 16:35 ` Ming Lei
2025-06-05 12:37 ` Yoav Cohen
2025-05-22 16:35 ` [PATCH 3/3] selftests: ublk: add test for UBLK_F_QUIESCE Ming Lei
2025-05-23 15:42 ` [PATCH 0/3] ublk: add UBLK_F_QUIESCE Jens Axboe
3 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-05-22 16:35 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Yoav Cohen, Ming Lei
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
cooperation.
This feature can help to support to upgrade ublk server application by
shutting down ublk server gracefully, meantime keep ublk block device
persistent during the upgrading period.
The feature is only available for UBLK_F_USER_RECOVERY.
Suggested-by: Yoav Cohen <yoav@nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
include/uapi/linux/ublk_cmd.h | 19 ++++++
2 files changed, 142 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index fbd075807525..6f51072776f1 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -51,6 +51,7 @@
/* private ioctl command mirror */
#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
#define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
+#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
@@ -67,7 +68,8 @@
| UBLK_F_ZONED \
| UBLK_F_USER_RECOVERY_FAIL_IO \
| UBLK_F_UPDATE_SIZE \
- | UBLK_F_AUTO_BUF_REG)
+ | UBLK_F_AUTO_BUF_REG \
+ | UBLK_F_QUIESCE)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
return -EINVAL;
}
+ if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
+ pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
+ return -EINVAL;
+ }
+
/*
* unprivileged device can't be trusted, but RECOVERY and
* RECOVERY_REISSUE still may hang error handling, so can't
@@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
mutex_unlock(&ub->mutex);
}
+
+struct count_busy {
+ const struct ublk_queue *ubq;
+ unsigned int nr_busy;
+};
+
+static bool ublk_count_busy_req(struct request *rq, void *data)
+{
+ struct count_busy *idle = data;
+
+ if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
+ idle->nr_busy += 1;
+ return true;
+}
+
+/* uring_cmd is guaranteed to be active if the associated request is idle */
+static bool ubq_has_idle_io(const struct ublk_queue *ubq)
+{
+ struct count_busy data = {
+ .ubq = ubq,
+ };
+
+ blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
+ return data.nr_busy < ubq->q_depth;
+}
+
+/* Wait until each hw queue has at least one idle IO */
+static int ublk_wait_for_idle_io(struct ublk_device *ub,
+ unsigned int timeout_ms)
+{
+ unsigned int elapsed = 0;
+ int ret;
+
+ while (elapsed < timeout_ms && !signal_pending(current)) {
+ unsigned int queues_cancelable = 0;
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ queues_cancelable += !!ubq_has_idle_io(ubq);
+ }
+
+ /*
+ * Each queue needs at least one active command for
+ * notifying ublk server
+ */
+ if (queues_cancelable == ub->dev_info.nr_hw_queues)
+ break;
+
+ msleep(UBLK_REQUEUE_DELAY_MS);
+ elapsed += UBLK_REQUEUE_DELAY_MS;
+ }
+
+ if (signal_pending(current))
+ ret = -EINTR;
+ else if (elapsed >= timeout_ms)
+ ret = -EBUSY;
+ else
+ ret = 0;
+
+ return ret;
+}
+
+static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
+ const struct ublksrv_ctrl_cmd *header)
+{
+ /* zero means wait forever */
+ u64 timeout_ms = header->data[0];
+ struct gendisk *disk;
+ int i, ret = -ENODEV;
+
+ if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
+ return -EOPNOTSUPP;
+
+ mutex_lock(&ub->mutex);
+ disk = ublk_get_disk(ub);
+ if (!disk)
+ goto unlock;
+ if (ub->dev_info.state == UBLK_S_DEV_DEAD)
+ goto put_disk;
+
+ ret = 0;
+ /* already in expected state */
+ if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ goto put_disk;
+
+ /* Mark all queues as canceling */
+ blk_mq_quiesce_queue(disk->queue);
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ ubq->canceling = true;
+ }
+ blk_mq_unquiesce_queue(disk->queue);
+
+ if (!timeout_ms)
+ timeout_ms = UINT_MAX;
+ ret = ublk_wait_for_idle_io(ub, timeout_ms);
+
+put_disk:
+ ublk_put_disk(disk);
+unlock:
+ mutex_unlock(&ub->mutex);
+
+ /* Cancel pending uring_cmd */
+ if (!ret)
+ ublk_cancel_dev(ub);
+ return ret;
+}
+
/*
* All control commands are sent via /dev/ublk-control, so we have to check
* the destination device's permission
@@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
case UBLK_CMD_START_USER_RECOVERY:
case UBLK_CMD_END_USER_RECOVERY:
case UBLK_CMD_UPDATE_SIZE:
+ case UBLK_CMD_QUIESCE_DEV:
mask = MAY_READ | MAY_WRITE;
break;
default:
@@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
ublk_ctrl_set_size(ub, header);
ret = 0;
break;
+ case UBLK_CMD_QUIESCE_DEV:
+ ret = ublk_ctrl_quiesce_dev(ub, header);
+ break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 1c40632cb164..56c7e3fc666f 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -53,6 +53,8 @@
_IOR('u', 0x14, struct ublksrv_ctrl_cmd)
#define UBLK_U_CMD_UPDATE_SIZE \
_IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
+#define UBLK_U_CMD_QUIESCE_DEV \
+ _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
/*
* 64bits are enough now, and it should be easy to extend in case of
@@ -253,6 +255,23 @@
*/
#define UBLK_F_AUTO_BUF_REG (1ULL << 11)
+/*
+ * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
+ * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
+ * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
+ * handling `UBLK_IO_RES_ABORT` correctly.
+ *
+ * Typical use case is for supporting to upgrade ublk server application,
+ * meantime keep ublk block device persistent during the period.
+ *
+ * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
+ *
+ * Note, this command returns -EBUSY in case that all IO commands are being
+ * handled by ublk server and not completed in specified time period which
+ * is passed from the control command parameter.
+ */
+#define UBLK_F_QUIESCE (1ULL << 12)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] selftests: ublk: add test for UBLK_F_QUIESCE
2025-05-22 16:35 [PATCH 0/3] ublk: add UBLK_F_QUIESCE Ming Lei
2025-05-22 16:35 ` [PATCH 1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE Ming Lei
2025-05-22 16:35 ` [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE Ming Lei
@ 2025-05-22 16:35 ` Ming Lei
2025-05-23 15:42 ` [PATCH 0/3] ublk: add UBLK_F_QUIESCE Jens Axboe
3 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-05-22 16:35 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Yoav Cohen, Ming Lei
Add test generic_11 for covering new control command of
UBLK_U_CMD_QUIESCE_DEV.
Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning
device state to quiesce states, then verify the feature via generic_10
by doing quiesce and recovery.
Cc: Yoav Cohen <yoav@nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 39 ++++++++++++++++
tools/testing/selftests/ublk/test_common.sh | 32 ++++++++++++--
.../testing/selftests/ublk/test_generic_04.sh | 2 +-
.../testing/selftests/ublk/test_generic_05.sh | 2 +-
.../testing/selftests/ublk/test_generic_11.sh | 44 +++++++++++++++++++
6 files changed, 115 insertions(+), 5 deletions(-)
create mode 100755 tools/testing/selftests/ublk/test_generic_11.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 81b8ed6b92e2..4dde8838261d 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -18,6 +18,7 @@ TEST_PROGS += test_generic_07.sh
TEST_PROGS += test_generic_08.sh
TEST_PROGS += test_generic_09.sh
TEST_PROGS += test_generic_10.sh
+TEST_PROGS += test_generic_11.sh
TEST_PROGS += test_null_01.sh
TEST_PROGS += test_null_02.sh
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index 683a23078f43..b5131a000795 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -228,6 +228,18 @@ static int ublk_ctrl_update_size(struct ublk_dev *dev,
return __ublk_ctrl_cmd(dev, &data);
}
+static int ublk_ctrl_quiesce_dev(struct ublk_dev *dev,
+ unsigned int timeout_ms)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_QUIESCE_DEV,
+ .flags = CTRL_CMD_HAS_DATA,
+ };
+
+ data.data[0] = timeout_ms;
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
static const char *ublk_dev_state_desc(struct ublk_dev *dev)
{
switch (dev->dev_info.state) {
@@ -1053,6 +1065,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
info->nr_hw_queues = nr_queues;
info->queue_depth = depth;
info->flags = ctx->flags;
+ if ((features & UBLK_F_QUIESCE) &&
+ (info->flags & UBLK_F_USER_RECOVERY))
+ info->flags |= UBLK_F_QUIESCE;
dev->tgt.ops = ops;
dev->tgt.sq_depth = depth;
dev->tgt.cq_depth = depth;
@@ -1250,6 +1265,7 @@ static int cmd_dev_get_features(void)
[const_ilog2(UBLK_F_USER_RECOVERY_FAIL_IO)] = "RECOVERY_FAIL_IO",
[const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE",
[const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG",
+ [const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE",
};
struct ublk_dev *dev;
__u64 features = 0;
@@ -1316,6 +1332,26 @@ static int cmd_dev_update_size(struct dev_ctx *ctx)
return ret;
}
+static int cmd_dev_quiesce(struct dev_ctx *ctx)
+{
+ struct ublk_dev *dev = ublk_ctrl_init();
+ int ret = -EINVAL;
+
+ if (!dev)
+ return -ENODEV;
+
+ if (ctx->dev_id < 0) {
+ fprintf(stderr, "device id isn't provided for quiesce\n");
+ goto out;
+ }
+ dev->dev_info.dev_id = ctx->dev_id;
+ ret = ublk_ctrl_quiesce_dev(dev, 10000);
+
+out:
+ ublk_ctrl_deinit(dev);
+ return ret;
+}
+
static void __cmd_create_help(char *exe, bool recovery)
{
int i;
@@ -1359,6 +1395,7 @@ static int cmd_dev_help(char *exe)
printf("\t -a list all devices, -n list specified device, default -a \n\n");
printf("%s features\n", exe);
printf("%s update_size -n dev_id -s|--size size_in_bytes \n", exe);
+ printf("%s quiesce -n dev_id\n", exe);
return 0;
}
@@ -1523,6 +1560,8 @@ int main(int argc, char *argv[])
ret = cmd_dev_get_features();
else if (!strcmp(cmd, "update_size"))
ret = cmd_dev_update_size(&ctx);
+ else if (!strcmp(cmd, "quiesce"))
+ ret = cmd_dev_quiesce(&ctx);
else
cmd_dev_help(argv[0]);
diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh
index 244d886b7eae..0145569ee7e9 100755
--- a/tools/testing/selftests/ublk/test_common.sh
+++ b/tools/testing/selftests/ublk/test_common.sh
@@ -220,6 +220,26 @@ _recover_ublk_dev() {
echo "$state"
}
+# quiesce device and return ublk device state
+__ublk_quiesce_dev()
+{
+ local dev_id=$1
+ local exp_state=$2
+ local state
+
+ if ! ${UBLK_PROG} quiesce -n "${dev_id}"; then
+ state=$(_get_ublk_dev_state "${dev_id}")
+ return "$state"
+ fi
+
+ for ((j=0;j<50;j++)); do
+ state=$(_get_ublk_dev_state "${dev_id}")
+ [ "$state" == "$exp_state" ] && break
+ sleep 1
+ done
+ echo "$state"
+}
+
# kill the ublk daemon and return ublk device state
__ublk_kill_daemon()
{
@@ -308,20 +328,26 @@ run_io_and_kill_daemon()
run_io_and_recover()
{
+ local action=$1
local state
local dev_id
+ shift 1
dev_id=$(_add_ublk_dev "$@")
_check_add_dev "$TID" $?
fio --name=job1 --filename=/dev/ublkb"${dev_id}" --ioengine=libaio \
- --rw=readwrite --iodepth=256 --size="${size}" --numjobs=4 \
+ --rw=randread --iodepth=256 --size="${size}" --numjobs=4 \
--runtime=20 --time_based > /dev/null 2>&1 &
sleep 4
- state=$(__ublk_kill_daemon "${dev_id}" "QUIESCED")
+ if [ "$action" == "kill_daemon" ]; then
+ state=$(__ublk_kill_daemon "${dev_id}" "QUIESCED")
+ elif [ "$action" == "quiesce_dev" ]; then
+ state=$(__ublk_quiesce_dev "${dev_id}" "QUIESCED")
+ fi
if [ "$state" != "QUIESCED" ]; then
- echo "device isn't quiesced($state) after killing daemon"
+ echo "device isn't quiesced($state) after $action"
return 255
fi
diff --git a/tools/testing/selftests/ublk/test_generic_04.sh b/tools/testing/selftests/ublk/test_generic_04.sh
index 8a3bc080c577..8b533217d4a1 100755
--- a/tools/testing/selftests/ublk/test_generic_04.sh
+++ b/tools/testing/selftests/ublk/test_generic_04.sh
@@ -8,7 +8,7 @@ ERR_CODE=0
ublk_run_recover_test()
{
- run_io_and_recover "$@"
+ run_io_and_recover "kill_daemon" "$@"
ERR_CODE=$?
if [ ${ERR_CODE} -ne 0 ]; then
echo "$TID failure: $*"
diff --git a/tools/testing/selftests/ublk/test_generic_05.sh b/tools/testing/selftests/ublk/test_generic_05.sh
index 3bb00a347402..398e9e2b58e1 100755
--- a/tools/testing/selftests/ublk/test_generic_05.sh
+++ b/tools/testing/selftests/ublk/test_generic_05.sh
@@ -8,7 +8,7 @@ ERR_CODE=0
ublk_run_recover_test()
{
- run_io_and_recover "$@"
+ run_io_and_recover "kill_daemon" "$@"
ERR_CODE=$?
if [ ${ERR_CODE} -ne 0 ]; then
echo "$TID failure: $*"
diff --git a/tools/testing/selftests/ublk/test_generic_11.sh b/tools/testing/selftests/ublk/test_generic_11.sh
new file mode 100755
index 000000000000..a00357a5ec6b
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_11.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_11"
+ERR_CODE=0
+
+ublk_run_quiesce_recover()
+{
+ run_io_and_recover "quiesce_dev" "$@"
+ ERR_CODE=$?
+ if [ ${ERR_CODE} -ne 0 ]; then
+ echo "$TID failure: $*"
+ _show_result $TID $ERR_CODE
+ fi
+}
+
+if ! _have_feature "QUIESCE"; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
+_prep_test "quiesce" "basic quiesce & recover function verification"
+
+_create_backfile 0 256M
+_create_backfile 1 128M
+_create_backfile 2 128M
+
+ublk_run_quiesce_recover -t null -q 2 -r 1 &
+ublk_run_quiesce_recover -t loop -q 2 -r 1 "${UBLK_BACKFILES[0]}" &
+ublk_run_quiesce_recover -t stripe -q 2 -r 1 "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+wait
+
+ublk_run_quiesce_recover -t null -q 2 -r 1 -i 1 &
+ublk_run_quiesce_recover -t loop -q 2 -r 1 -i 1 "${UBLK_BACKFILES[0]}" &
+ublk_run_quiesce_recover -t stripe -q 2 -r 1 -i 1 "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+wait
+
+_cleanup_test "quiesce"
+_show_result $TID $ERR_CODE
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/3] ublk: add UBLK_F_QUIESCE
2025-05-22 16:35 [PATCH 0/3] ublk: add UBLK_F_QUIESCE Ming Lei
` (2 preceding siblings ...)
2025-05-22 16:35 ` [PATCH 3/3] selftests: ublk: add test for UBLK_F_QUIESCE Ming Lei
@ 2025-05-23 15:42 ` Jens Axboe
3 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2025-05-23 15:42 UTC (permalink / raw)
To: linux-block, Ming Lei; +Cc: Uday Shankar, Caleb Sander Mateos, Yoav Cohen
On Fri, 23 May 2025 00:35:18 +0800, Ming Lei wrote:
> The 1st patch adds test case for covering UBLK_U_CMD_UPDATE_SIZE which is
> added recently.
>
> The 2nd patch adds UBLK_F_QUIESCE for supporting to quiesce device in grace
> way, and typical use case is to upgrade ublk server, meantime keep ublk
> block device online.
>
> [...]
Applied, thanks!
[1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
commit: f40b1f2670f084d2879e4a125a75ff7788ec67ba
[2/3] ublk: add feature UBLK_F_QUIESCE
commit: b465ae7b2524170cb14fa25dbcb84923bfb1a0a9
[3/3] selftests: ublk: add test for UBLK_F_QUIESCE
commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-05-22 16:35 ` [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE Ming Lei
@ 2025-06-05 12:37 ` Yoav Cohen
2025-06-06 9:54 ` Ming Lei
0 siblings, 1 reply; 13+ messages in thread
From: Yoav Cohen @ 2025-06-05 12:37 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, linux-block@vger.kernel.org
Cc: Uday Shankar, Caleb Sander Mateos
Hi Ming,
Thank you for that.
Can you please clarify this
+/* Wait until each hw queue has at least one idle IO */
what do you exactly wait here? and why it is per io queue?
As I understand if the wait timedout the operation will be canceled.
Thank you
________________________________________
From: Ming Lei <ming.lei@redhat.com>
Sent: Thursday, May 22, 2025 7:35 PM
To: Jens Axboe; linux-block@vger.kernel.org
Cc: Uday Shankar; Caleb Sander Mateos; Yoav Cohen; Ming Lei
Subject: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
External email: Use caution opening links or attachments
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
cooperation.
This feature can help to support to upgrade ublk server application by
shutting down ublk server gracefully, meantime keep ublk block device
persistent during the upgrading period.
The feature is only available for UBLK_F_USER_RECOVERY.
Suggested-by: Yoav Cohen <yoav@nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
include/uapi/linux/ublk_cmd.h | 19 ++++++
2 files changed, 142 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index fbd075807525..6f51072776f1 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -51,6 +51,7 @@
/* private ioctl command mirror */
#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
#define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
+#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
@@ -67,7 +68,8 @@
| UBLK_F_ZONED \
| UBLK_F_USER_RECOVERY_FAIL_IO \
| UBLK_F_UPDATE_SIZE \
- | UBLK_F_AUTO_BUF_REG)
+ | UBLK_F_AUTO_BUF_REG \
+ | UBLK_F_QUIESCE)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
return -EINVAL;
}
+ if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
+ pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
+ return -EINVAL;
+ }
+
/*
* unprivileged device can't be trusted, but RECOVERY and
* RECOVERY_REISSUE still may hang error handling, so can't
@@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
mutex_unlock(&ub->mutex);
}
+
+struct count_busy {
+ const struct ublk_queue *ubq;
+ unsigned int nr_busy;
+};
+
+static bool ublk_count_busy_req(struct request *rq, void *data)
+{
+ struct count_busy *idle = data;
+
+ if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
+ idle->nr_busy += 1;
+ return true;
+}
+
+/* uring_cmd is guaranteed to be active if the associated request is idle */
+static bool ubq_has_idle_io(const struct ublk_queue *ubq)
+{
+ struct count_busy data = {
+ .ubq = ubq,
+ };
+
+ blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
+ return data.nr_busy < ubq->q_depth;
+}
+
+/* Wait until each hw queue has at least one idle IO */
+static int ublk_wait_for_idle_io(struct ublk_device *ub,
+ unsigned int timeout_ms)
+{
+ unsigned int elapsed = 0;
+ int ret;
+
+ while (elapsed < timeout_ms && !signal_pending(current)) {
+ unsigned int queues_cancelable = 0;
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ queues_cancelable += !!ubq_has_idle_io(ubq);
+ }
+
+ /*
+ * Each queue needs at least one active command for
+ * notifying ublk server
+ */
+ if (queues_cancelable == ub->dev_info.nr_hw_queues)
+ break;
+
+ msleep(UBLK_REQUEUE_DELAY_MS);
+ elapsed += UBLK_REQUEUE_DELAY_MS;
+ }
+
+ if (signal_pending(current))
+ ret = -EINTR;
+ else if (elapsed >= timeout_ms)
+ ret = -EBUSY;
+ else
+ ret = 0;
+
+ return ret;
+}
+
+static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
+ const struct ublksrv_ctrl_cmd *header)
+{
+ /* zero means wait forever */
+ u64 timeout_ms = header->data[0];
+ struct gendisk *disk;
+ int i, ret = -ENODEV;
+
+ if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
+ return -EOPNOTSUPP;
+
+ mutex_lock(&ub->mutex);
+ disk = ublk_get_disk(ub);
+ if (!disk)
+ goto unlock;
+ if (ub->dev_info.state == UBLK_S_DEV_DEAD)
+ goto put_disk;
+
+ ret = 0;
+ /* already in expected state */
+ if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ goto put_disk;
+
+ /* Mark all queues as canceling */
+ blk_mq_quiesce_queue(disk->queue);
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ ubq->canceling = true;
+ }
+ blk_mq_unquiesce_queue(disk->queue);
+
+ if (!timeout_ms)
+ timeout_ms = UINT_MAX;
+ ret = ublk_wait_for_idle_io(ub, timeout_ms);
+
+put_disk:
+ ublk_put_disk(disk);
+unlock:
+ mutex_unlock(&ub->mutex);
+
+ /* Cancel pending uring_cmd */
+ if (!ret)
+ ublk_cancel_dev(ub);
+ return ret;
+}
+
/*
* All control commands are sent via /dev/ublk-control, so we have to check
* the destination device's permission
@@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
case UBLK_CMD_START_USER_RECOVERY:
case UBLK_CMD_END_USER_RECOVERY:
case UBLK_CMD_UPDATE_SIZE:
+ case UBLK_CMD_QUIESCE_DEV:
mask = MAY_READ | MAY_WRITE;
break;
default:
@@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
ublk_ctrl_set_size(ub, header);
ret = 0;
break;
+ case UBLK_CMD_QUIESCE_DEV:
+ ret = ublk_ctrl_quiesce_dev(ub, header);
+ break;
default:
ret = -EOPNOTSUPP;
break;
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 1c40632cb164..56c7e3fc666f 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -53,6 +53,8 @@
_IOR('u', 0x14, struct ublksrv_ctrl_cmd)
#define UBLK_U_CMD_UPDATE_SIZE \
_IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
+#define UBLK_U_CMD_QUIESCE_DEV \
+ _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
/*
* 64bits are enough now, and it should be easy to extend in case of
@@ -253,6 +255,23 @@
*/
#define UBLK_F_AUTO_BUF_REG (1ULL << 11)
+/*
+ * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
+ * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
+ * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
+ * handling `UBLK_IO_RES_ABORT` correctly.
+ *
+ * Typical use case is for supporting to upgrade ublk server application,
+ * meantime keep ublk block device persistent during the period.
+ *
+ * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
+ *
+ * Note, this command returns -EBUSY in case that all IO commands are being
+ * handled by ublk server and not completed in specified time period which
+ * is passed from the control command parameter.
+ */
+#define UBLK_F_QUIESCE (1ULL << 12)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
--
2.47.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-05 12:37 ` Yoav Cohen
@ 2025-06-06 9:54 ` Ming Lei
2025-06-08 10:20 ` Yoav Cohen
0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-06-06 9:54 UTC (permalink / raw)
To: Yoav Cohen
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
Hi Yoav,
On Thu, Jun 05, 2025 at 12:37:01PM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you for that.
> Can you please clarify this
> +/* Wait until each hw queue has at least one idle IO */
> what do you exactly wait here? and why it is per io queue?
> As I understand if the wait timedout the operation will be canceled.
One idle IO means one active io_uring cmd, so we can use this command
for notifying ublk server. Otherwise, ublk server may not get chance to
know the quiesce action.
Because each queue may have standalone task context.
The condition should be satisfied easily in any implementation, but please
let me if it could be one issue in your ublk server implementation.
Big reason is that ublk doesn't have one such admin command for
housekeeping.
Thanks,
Ming
>
> Thank you
>
> ________________________________________
> From: Ming Lei <ming.lei@redhat.com>
> Sent: Thursday, May 22, 2025 7:35 PM
> To: Jens Axboe; linux-block@vger.kernel.org
> Cc: Uday Shankar; Caleb Sander Mateos; Yoav Cohen; Ming Lei
> Subject: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
>
> External email: Use caution opening links or attachments
>
>
> Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
> for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
> or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
> cooperation.
>
> This feature can help to support to upgrade ublk server application by
> shutting down ublk server gracefully, meantime keep ublk block device
> persistent during the upgrading period.
>
> The feature is only available for UBLK_F_USER_RECOVERY.
>
> Suggested-by: Yoav Cohen <yoav@nvidia.com>
> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
> include/uapi/linux/ublk_cmd.h | 19 ++++++
> 2 files changed, 142 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index fbd075807525..6f51072776f1 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -51,6 +51,7 @@
> /* private ioctl command mirror */
> #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
> #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
> +#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
>
> #define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> #define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> @@ -67,7 +68,8 @@
> | UBLK_F_ZONED \
> | UBLK_F_USER_RECOVERY_FAIL_IO \
> | UBLK_F_UPDATE_SIZE \
> - | UBLK_F_AUTO_BUF_REG)
> + | UBLK_F_AUTO_BUF_REG \
> + | UBLK_F_QUIESCE)
>
> #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> | UBLK_F_USER_RECOVERY_REISSUE \
> @@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> return -EINVAL;
> }
>
> + if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
> + pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
> + return -EINVAL;
> + }
> +
> /*
> * unprivileged device can't be trusted, but RECOVERY and
> * RECOVERY_REISSUE still may hang error handling, so can't
> @@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
> set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
> mutex_unlock(&ub->mutex);
> }
> +
> +struct count_busy {
> + const struct ublk_queue *ubq;
> + unsigned int nr_busy;
> +};
> +
> +static bool ublk_count_busy_req(struct request *rq, void *data)
> +{
> + struct count_busy *idle = data;
> +
> + if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
> + idle->nr_busy += 1;
> + return true;
> +}
> +
> +/* uring_cmd is guaranteed to be active if the associated request is idle */
> +static bool ubq_has_idle_io(const struct ublk_queue *ubq)
> +{
> + struct count_busy data = {
> + .ubq = ubq,
> + };
> +
> + blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
> + return data.nr_busy < ubq->q_depth;
> +}
> +
> +/* Wait until each hw queue has at least one idle IO */
> +static int ublk_wait_for_idle_io(struct ublk_device *ub,
> + unsigned int timeout_ms)
> +{
> + unsigned int elapsed = 0;
> + int ret;
> +
> + while (elapsed < timeout_ms && !signal_pending(current)) {
> + unsigned int queues_cancelable = 0;
> + int i;
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + queues_cancelable += !!ubq_has_idle_io(ubq);
> + }
> +
> + /*
> + * Each queue needs at least one active command for
> + * notifying ublk server
> + */
> + if (queues_cancelable == ub->dev_info.nr_hw_queues)
> + break;
> +
> + msleep(UBLK_REQUEUE_DELAY_MS);
> + elapsed += UBLK_REQUEUE_DELAY_MS;
> + }
> +
> + if (signal_pending(current))
> + ret = -EINTR;
> + else if (elapsed >= timeout_ms)
> + ret = -EBUSY;
> + else
> + ret = 0;
> +
> + return ret;
> +}
> +
> +static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
> + const struct ublksrv_ctrl_cmd *header)
> +{
> + /* zero means wait forever */
> + u64 timeout_ms = header->data[0];
> + struct gendisk *disk;
> + int i, ret = -ENODEV;
> +
> + if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
> + return -EOPNOTSUPP;
> +
> + mutex_lock(&ub->mutex);
> + disk = ublk_get_disk(ub);
> + if (!disk)
> + goto unlock;
> + if (ub->dev_info.state == UBLK_S_DEV_DEAD)
> + goto put_disk;
> +
> + ret = 0;
> + /* already in expected state */
> + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> + goto put_disk;
> +
> + /* Mark all queues as canceling */
> + blk_mq_quiesce_queue(disk->queue);
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + ubq->canceling = true;
> + }
> + blk_mq_unquiesce_queue(disk->queue);
> +
> + if (!timeout_ms)
> + timeout_ms = UINT_MAX;
> + ret = ublk_wait_for_idle_io(ub, timeout_ms);
> +
> +put_disk:
> + ublk_put_disk(disk);
> +unlock:
> + mutex_unlock(&ub->mutex);
> +
> + /* Cancel pending uring_cmd */
> + if (!ret)
> + ublk_cancel_dev(ub);
> + return ret;
> +}
> +
> /*
> * All control commands are sent via /dev/ublk-control, so we have to check
> * the destination device's permission
> @@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
> case UBLK_CMD_START_USER_RECOVERY:
> case UBLK_CMD_END_USER_RECOVERY:
> case UBLK_CMD_UPDATE_SIZE:
> + case UBLK_CMD_QUIESCE_DEV:
> mask = MAY_READ | MAY_WRITE;
> break;
> default:
> @@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
> ublk_ctrl_set_size(ub, header);
> ret = 0;
> break;
> + case UBLK_CMD_QUIESCE_DEV:
> + ret = ublk_ctrl_quiesce_dev(ub, header);
> + break;
> default:
> ret = -EOPNOTSUPP;
> break;
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 1c40632cb164..56c7e3fc666f 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -53,6 +53,8 @@
> _IOR('u', 0x14, struct ublksrv_ctrl_cmd)
> #define UBLK_U_CMD_UPDATE_SIZE \
> _IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
> +#define UBLK_U_CMD_QUIESCE_DEV \
> + _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
>
> /*
> * 64bits are enough now, and it should be easy to extend in case of
> @@ -253,6 +255,23 @@
> */
> #define UBLK_F_AUTO_BUF_REG (1ULL << 11)
>
> +/*
> + * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
> + * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
> + * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
> + * handling `UBLK_IO_RES_ABORT` correctly.
> + *
> + * Typical use case is for supporting to upgrade ublk server application,
> + * meantime keep ublk block device persistent during the period.
> + *
> + * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
> + *
> + * Note, this command returns -EBUSY in case that all IO commands are being
> + * handled by ublk server and not completed in specified time period which
> + * is passed from the control command parameter.
> + */
> +#define UBLK_F_QUIESCE (1ULL << 12)
> +
> /* device state */
> #define UBLK_S_DEV_DEAD 0
> #define UBLK_S_DEV_LIVE 1
> --
> 2.47.0
>
--
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-06 9:54 ` Ming Lei
@ 2025-06-08 10:20 ` Yoav Cohen
2025-06-08 14:34 ` Ming Lei
0 siblings, 1 reply; 13+ messages in thread
From: Yoav Cohen @ 2025-06-08 10:20 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
Hi Ming,
Thank you for the reply.
I understand now the requirement for the idle IO but needs your advice.
On our case the backing IOPS are going over the network, when switching to upgrade mode we are shorting the timeout of each IO in order that the application will really finish as soon as possible.
On this case (until now) we used to prevent COMMIT_AND_FETCH on the IOP/s that are fail due to timeout to prevent the user for seeing the failed IOP/s only because we are on upgrade mode.
Checking the code I don't see how we can do it now as there may be a situation where all IOP/s are failed due to it while calling QUIECE_DEV.
I saw that ublk prevent zeroed read but allow zeroed write(__ublk_complete_rq), is that just for supporting devices with backing file or a real requirement for every ublk device?
Any tips if there is other way to make the kernel retry this commands?
Thank you.
________________________________________
From: Ming Lei <ming.lei@redhat.com>
Sent: Friday, June 6, 2025 12:54 PM
To: Yoav Cohen
Cc: Jens Axboe; linux-block@vger.kernel.org; Uday Shankar; Caleb Sander Mateos
Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
External email: Use caution opening links or attachments
Hi Yoav,
On Thu, Jun 05, 2025 at 12:37:01PM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you for that.
> Can you please clarify this
> +/* Wait until each hw queue has at least one idle IO */
> what do you exactly wait here? and why it is per io queue?
> As I understand if the wait timedout the operation will be canceled.
One idle IO means one active io_uring cmd, so we can use this command
for notifying ublk server. Otherwise, ublk server may not get chance to
know the quiesce action.
Because each queue may have standalone task context.
The condition should be satisfied easily in any implementation, but please
let me if it could be one issue in your ublk server implementation.
Big reason is that ublk doesn't have one such admin command for
housekeeping.
Thanks,
Ming
>
> Thank you
>
> ________________________________________
> From: Ming Lei <ming.lei@redhat.com>
> Sent: Thursday, May 22, 2025 7:35 PM
> To: Jens Axboe; linux-block@vger.kernel.org
> Cc: Uday Shankar; Caleb Sander Mateos; Yoav Cohen; Ming Lei
> Subject: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
>
> External email: Use caution opening links or attachments
>
>
> Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
> for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
> or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
> cooperation.
>
> This feature can help to support to upgrade ublk server application by
> shutting down ublk server gracefully, meantime keep ublk block device
> persistent during the upgrading period.
>
> The feature is only available for UBLK_F_USER_RECOVERY.
>
> Suggested-by: Yoav Cohen <yoav@nvidia.com>
> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
> include/uapi/linux/ublk_cmd.h | 19 ++++++
> 2 files changed, 142 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index fbd075807525..6f51072776f1 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -51,6 +51,7 @@
> /* private ioctl command mirror */
> #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
> #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
> +#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
>
> #define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> #define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> @@ -67,7 +68,8 @@
> | UBLK_F_ZONED \
> | UBLK_F_USER_RECOVERY_FAIL_IO \
> | UBLK_F_UPDATE_SIZE \
> - | UBLK_F_AUTO_BUF_REG)
> + | UBLK_F_AUTO_BUF_REG \
> + | UBLK_F_QUIESCE)
>
> #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> | UBLK_F_USER_RECOVERY_REISSUE \
> @@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> return -EINVAL;
> }
>
> + if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
> + pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
> + return -EINVAL;
> + }
> +
> /*
> * unprivileged device can't be trusted, but RECOVERY and
> * RECOVERY_REISSUE still may hang error handling, so can't
> @@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
> set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
> mutex_unlock(&ub->mutex);
> }
> +
> +struct count_busy {
> + const struct ublk_queue *ubq;
> + unsigned int nr_busy;
> +};
> +
> +static bool ublk_count_busy_req(struct request *rq, void *data)
> +{
> + struct count_busy *idle = data;
> +
> + if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
> + idle->nr_busy += 1;
> + return true;
> +}
> +
> +/* uring_cmd is guaranteed to be active if the associated request is idle */
> +static bool ubq_has_idle_io(const struct ublk_queue *ubq)
> +{
> + struct count_busy data = {
> + .ubq = ubq,
> + };
> +
> + blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
> + return data.nr_busy < ubq->q_depth;
> +}
> +
> +/* Wait until each hw queue has at least one idle IO */
> +static int ublk_wait_for_idle_io(struct ublk_device *ub,
> + unsigned int timeout_ms)
> +{
> + unsigned int elapsed = 0;
> + int ret;
> +
> + while (elapsed < timeout_ms && !signal_pending(current)) {
> + unsigned int queues_cancelable = 0;
> + int i;
> +
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + queues_cancelable += !!ubq_has_idle_io(ubq);
> + }
> +
> + /*
> + * Each queue needs at least one active command for
> + * notifying ublk server
> + */
> + if (queues_cancelable == ub->dev_info.nr_hw_queues)
> + break;
> +
> + msleep(UBLK_REQUEUE_DELAY_MS);
> + elapsed += UBLK_REQUEUE_DELAY_MS;
> + }
> +
> + if (signal_pending(current))
> + ret = -EINTR;
> + else if (elapsed >= timeout_ms)
> + ret = -EBUSY;
> + else
> + ret = 0;
> +
> + return ret;
> +}
> +
> +static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
> + const struct ublksrv_ctrl_cmd *header)
> +{
> + /* zero means wait forever */
> + u64 timeout_ms = header->data[0];
> + struct gendisk *disk;
> + int i, ret = -ENODEV;
> +
> + if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
> + return -EOPNOTSUPP;
> +
> + mutex_lock(&ub->mutex);
> + disk = ublk_get_disk(ub);
> + if (!disk)
> + goto unlock;
> + if (ub->dev_info.state == UBLK_S_DEV_DEAD)
> + goto put_disk;
> +
> + ret = 0;
> + /* already in expected state */
> + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> + goto put_disk;
> +
> + /* Mark all queues as canceling */
> + blk_mq_quiesce_queue(disk->queue);
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> +
> + ubq->canceling = true;
> + }
> + blk_mq_unquiesce_queue(disk->queue);
> +
> + if (!timeout_ms)
> + timeout_ms = UINT_MAX;
> + ret = ublk_wait_for_idle_io(ub, timeout_ms);
> +
> +put_disk:
> + ublk_put_disk(disk);
> +unlock:
> + mutex_unlock(&ub->mutex);
> +
> + /* Cancel pending uring_cmd */
> + if (!ret)
> + ublk_cancel_dev(ub);
> + return ret;
> +}
> +
> /*
> * All control commands are sent via /dev/ublk-control, so we have to check
> * the destination device's permission
> @@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
> case UBLK_CMD_START_USER_RECOVERY:
> case UBLK_CMD_END_USER_RECOVERY:
> case UBLK_CMD_UPDATE_SIZE:
> + case UBLK_CMD_QUIESCE_DEV:
> mask = MAY_READ | MAY_WRITE;
> break;
> default:
> @@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
> ublk_ctrl_set_size(ub, header);
> ret = 0;
> break;
> + case UBLK_CMD_QUIESCE_DEV:
> + ret = ublk_ctrl_quiesce_dev(ub, header);
> + break;
> default:
> ret = -EOPNOTSUPP;
> break;
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 1c40632cb164..56c7e3fc666f 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -53,6 +53,8 @@
> _IOR('u', 0x14, struct ublksrv_ctrl_cmd)
> #define UBLK_U_CMD_UPDATE_SIZE \
> _IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
> +#define UBLK_U_CMD_QUIESCE_DEV \
> + _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
>
> /*
> * 64bits are enough now, and it should be easy to extend in case of
> @@ -253,6 +255,23 @@
> */
> #define UBLK_F_AUTO_BUF_REG (1ULL << 11)
>
> +/*
> + * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
> + * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
> + * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
> + * handling `UBLK_IO_RES_ABORT` correctly.
> + *
> + * Typical use case is for supporting to upgrade ublk server application,
> + * meantime keep ublk block device persistent during the period.
> + *
> + * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
> + *
> + * Note, this command returns -EBUSY in case that all IO commands are being
> + * handled by ublk server and not completed in specified time period which
> + * is passed from the control command parameter.
> + */
> +#define UBLK_F_QUIESCE (1ULL << 12)
> +
> /* device state */
> #define UBLK_S_DEV_DEAD 0
> #define UBLK_S_DEV_LIVE 1
> --
> 2.47.0
>
--
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-08 10:20 ` Yoav Cohen
@ 2025-06-08 14:34 ` Ming Lei
2025-06-12 8:17 ` Yoav Cohen
0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-06-08 14:34 UTC (permalink / raw)
To: Yoav Cohen
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
On Sun, Jun 08, 2025 at 10:20:25AM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you for the reply.
> I understand now the requirement for the idle IO but needs your advice.
> On our case the backing IOPS are going over the network, when switching to upgrade mode we are shorting the timeout of each IO in order that the application will really finish as soon as possible.
> On this case (until now) we used to prevent COMMIT_AND_FETCH on the IOP/s that are fail due to timeout to prevent the user for seeing the failed IOP/s only because we are on upgrade mode.
> Checking the code I don't see how we can do it now as there may be a situation where all IOP/s are failed due to it while calling QUIECE_DEV.
If the network IO takes too long, you may provide bigger timeout parameter
to QUIECE_DEV or even wait forever, and any one of inflight IO is supposed to
complete during limited time, then QUIECE_DEV can move on.
>
> I saw that ublk prevent zeroed read but allow zeroed write(__ublk_complete_rq), is that just for supporting devices with backing file or a real requirement for every ublk device?
> Any tips if there is other way to make the kernel retry this commands?
> Thank you.
Please set UBLK_F_USER_RECOVERY and UBLK_F_USER_RECOVERY_REISSUE which
won't fail IO during the period, and all should be requeued after device
is recovered to LIVE after your upgrade is done, and all won't be timed out
too.
Please check if the two above flags with QUIECE_DEV work for your case.
Thanks,
Ming
>
> ________________________________________
> From: Ming Lei <ming.lei@redhat.com>
> Sent: Friday, June 6, 2025 12:54 PM
> To: Yoav Cohen
> Cc: Jens Axboe; linux-block@vger.kernel.org; Uday Shankar; Caleb Sander Mateos
> Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
>
> External email: Use caution opening links or attachments
>
>
> Hi Yoav,
>
> On Thu, Jun 05, 2025 at 12:37:01PM +0000, Yoav Cohen wrote:
> > Hi Ming,
> >
> > Thank you for that.
> > Can you please clarify this
> > +/* Wait until each hw queue has at least one idle IO */
> > what do you exactly wait here? and why it is per io queue?
> > As I understand if the wait timedout the operation will be canceled.
>
> One idle IO means one active io_uring cmd, so we can use this command
> for notifying ublk server. Otherwise, ublk server may not get chance to
> know the quiesce action.
>
> Because each queue may have standalone task context.
>
> The condition should be satisfied easily in any implementation, but please
> let me if it could be one issue in your ublk server implementation.
>
> Big reason is that ublk doesn't have one such admin command for
> housekeeping.
>
> Thanks,
> Ming
>
>
> >
> > Thank you
> >
> > ________________________________________
> > From: Ming Lei <ming.lei@redhat.com>
> > Sent: Thursday, May 22, 2025 7:35 PM
> > To: Jens Axboe; linux-block@vger.kernel.org
> > Cc: Uday Shankar; Caleb Sander Mateos; Yoav Cohen; Ming Lei
> > Subject: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
> > for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
> > or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
> > cooperation.
> >
> > This feature can help to support to upgrade ublk server application by
> > shutting down ublk server gracefully, meantime keep ublk block device
> > persistent during the upgrading period.
> >
> > The feature is only available for UBLK_F_USER_RECOVERY.
> >
> > Suggested-by: Yoav Cohen <yoav@nvidia.com>
> > Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
> > include/uapi/linux/ublk_cmd.h | 19 ++++++
> > 2 files changed, 142 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index fbd075807525..6f51072776f1 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -51,6 +51,7 @@
> > /* private ioctl command mirror */
> > #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
> > #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
> > +#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
> >
> > #define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> > #define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> > @@ -67,7 +68,8 @@
> > | UBLK_F_ZONED \
> > | UBLK_F_USER_RECOVERY_FAIL_IO \
> > | UBLK_F_UPDATE_SIZE \
> > - | UBLK_F_AUTO_BUF_REG)
> > + | UBLK_F_AUTO_BUF_REG \
> > + | UBLK_F_QUIESCE)
> >
> > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> > | UBLK_F_USER_RECOVERY_REISSUE \
> > @@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > return -EINVAL;
> > }
> >
> > + if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
> > + pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
> > + return -EINVAL;
> > + }
> > +
> > /*
> > * unprivileged device can't be trusted, but RECOVERY and
> > * RECOVERY_REISSUE still may hang error handling, so can't
> > @@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
> > set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
> > mutex_unlock(&ub->mutex);
> > }
> > +
> > +struct count_busy {
> > + const struct ublk_queue *ubq;
> > + unsigned int nr_busy;
> > +};
> > +
> > +static bool ublk_count_busy_req(struct request *rq, void *data)
> > +{
> > + struct count_busy *idle = data;
> > +
> > + if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
> > + idle->nr_busy += 1;
> > + return true;
> > +}
> > +
> > +/* uring_cmd is guaranteed to be active if the associated request is idle */
> > +static bool ubq_has_idle_io(const struct ublk_queue *ubq)
> > +{
> > + struct count_busy data = {
> > + .ubq = ubq,
> > + };
> > +
> > + blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
> > + return data.nr_busy < ubq->q_depth;
> > +}
> > +
> > +/* Wait until each hw queue has at least one idle IO */
> > +static int ublk_wait_for_idle_io(struct ublk_device *ub,
> > + unsigned int timeout_ms)
> > +{
> > + unsigned int elapsed = 0;
> > + int ret;
> > +
> > + while (elapsed < timeout_ms && !signal_pending(current)) {
> > + unsigned int queues_cancelable = 0;
> > + int i;
> > +
> > + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> > + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> > +
> > + queues_cancelable += !!ubq_has_idle_io(ubq);
> > + }
> > +
> > + /*
> > + * Each queue needs at least one active command for
> > + * notifying ublk server
> > + */
> > + if (queues_cancelable == ub->dev_info.nr_hw_queues)
> > + break;
> > +
> > + msleep(UBLK_REQUEUE_DELAY_MS);
> > + elapsed += UBLK_REQUEUE_DELAY_MS;
> > + }
> > +
> > + if (signal_pending(current))
> > + ret = -EINTR;
> > + else if (elapsed >= timeout_ms)
> > + ret = -EBUSY;
> > + else
> > + ret = 0;
> > +
> > + return ret;
> > +}
> > +
> > +static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
> > + const struct ublksrv_ctrl_cmd *header)
> > +{
> > + /* zero means wait forever */
> > + u64 timeout_ms = header->data[0];
> > + struct gendisk *disk;
> > + int i, ret = -ENODEV;
> > +
> > + if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
> > + return -EOPNOTSUPP;
> > +
> > + mutex_lock(&ub->mutex);
> > + disk = ublk_get_disk(ub);
> > + if (!disk)
> > + goto unlock;
> > + if (ub->dev_info.state == UBLK_S_DEV_DEAD)
> > + goto put_disk;
> > +
> > + ret = 0;
> > + /* already in expected state */
> > + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> > + goto put_disk;
> > +
> > + /* Mark all queues as canceling */
> > + blk_mq_quiesce_queue(disk->queue);
> > + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> > + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> > +
> > + ubq->canceling = true;
> > + }
> > + blk_mq_unquiesce_queue(disk->queue);
> > +
> > + if (!timeout_ms)
> > + timeout_ms = UINT_MAX;
> > + ret = ublk_wait_for_idle_io(ub, timeout_ms);
> > +
> > +put_disk:
> > + ublk_put_disk(disk);
> > +unlock:
> > + mutex_unlock(&ub->mutex);
> > +
> > + /* Cancel pending uring_cmd */
> > + if (!ret)
> > + ublk_cancel_dev(ub);
> > + return ret;
> > +}
> > +
> > /*
> > * All control commands are sent via /dev/ublk-control, so we have to check
> > * the destination device's permission
> > @@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
> > case UBLK_CMD_START_USER_RECOVERY:
> > case UBLK_CMD_END_USER_RECOVERY:
> > case UBLK_CMD_UPDATE_SIZE:
> > + case UBLK_CMD_QUIESCE_DEV:
> > mask = MAY_READ | MAY_WRITE;
> > break;
> > default:
> > @@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
> > ublk_ctrl_set_size(ub, header);
> > ret = 0;
> > break;
> > + case UBLK_CMD_QUIESCE_DEV:
> > + ret = ublk_ctrl_quiesce_dev(ub, header);
> > + break;
> > default:
> > ret = -EOPNOTSUPP;
> > break;
> > diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> > index 1c40632cb164..56c7e3fc666f 100644
> > --- a/include/uapi/linux/ublk_cmd.h
> > +++ b/include/uapi/linux/ublk_cmd.h
> > @@ -53,6 +53,8 @@
> > _IOR('u', 0x14, struct ublksrv_ctrl_cmd)
> > #define UBLK_U_CMD_UPDATE_SIZE \
> > _IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
> > +#define UBLK_U_CMD_QUIESCE_DEV \
> > + _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
> >
> > /*
> > * 64bits are enough now, and it should be easy to extend in case of
> > @@ -253,6 +255,23 @@
> > */
> > #define UBLK_F_AUTO_BUF_REG (1ULL << 11)
> >
> > +/*
> > + * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
> > + * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
> > + * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
> > + * handling `UBLK_IO_RES_ABORT` correctly.
> > + *
> > + * Typical use case is for supporting to upgrade ublk server application,
> > + * meantime keep ublk block device persistent during the period.
> > + *
> > + * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
> > + *
> > + * Note, this command returns -EBUSY in case that all IO commands are being
> > + * handled by ublk server and not completed in specified time period which
> > + * is passed from the control command parameter.
> > + */
> > +#define UBLK_F_QUIESCE (1ULL << 12)
> > +
> > /* device state */
> > #define UBLK_S_DEV_DEAD 0
> > #define UBLK_S_DEV_LIVE 1
> > --
> > 2.47.0
> >
>
> --
> Ming
>
--
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-08 14:34 ` Ming Lei
@ 2025-06-12 8:17 ` Yoav Cohen
2025-06-12 9:15 ` Ming Lei
0 siblings, 1 reply; 13+ messages in thread
From: Yoav Cohen @ 2025-06-12 8:17 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
Hi Ming,
Thank you very much, I managed to integrate the feature to our application and it seems to work perfectly fine during our update tests.
Just a double check: when UBLK_F_USER_RECOVERY & UBLK_F_USER_RECOVERY_REISSUE
and QUIECE_DEV was called - does any IO that will be completed using COMMIT_AND_FETCH with a failure (i.e result=-EIO) will be retry after the recovery stage?
Thank you again!
________________________________________
From: Ming Lei <ming.lei@redhat.com>
Sent: Sunday, June 8, 2025 5:34 PM
To: Yoav Cohen
Cc: Jens Axboe; linux-block@vger.kernel.org; Uday Shankar; Caleb Sander Mateos
Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
External email: Use caution opening links or attachments
On Sun, Jun 08, 2025 at 10:20:25AM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you for the reply.
> I understand now the requirement for the idle IO but needs your advice.
> On our case the backing IOPS are going over the network, when switching to upgrade mode we are shorting the timeout of each IO in order that the application will really finish as soon as possible.
> On this case (until now) we used to prevent COMMIT_AND_FETCH on the IOP/s that are fail due to timeout to prevent the user for seeing the failed IOP/s only because we are on upgrade mode.
> Checking the code I don't see how we can do it now as there may be a situation where all IOP/s are failed due to it while calling QUIECE_DEV.
If the network IO takes too long, you may provide bigger timeout parameter
to QUIECE_DEV or even wait forever, and any one of inflight IO is supposed to
complete during limited time, then QUIECE_DEV can move on.
>
> I saw that ublk prevent zeroed read but allow zeroed write(__ublk_complete_rq), is that just for supporting devices with backing file or a real requirement for every ublk device?
> Any tips if there is other way to make the kernel retry this commands?
> Thank you.
Please set UBLK_F_USER_RECOVERY and UBLK_F_USER_RECOVERY_REISSUE which
won't fail IO during the period, and all should be requeued after device
is recovered to LIVE after your upgrade is done, and all won't be timed out
too.
Please check if the two above flags with QUIECE_DEV work for your case.
Thanks,
Ming
>
> ________________________________________
> From: Ming Lei <ming.lei@redhat.com>
> Sent: Friday, June 6, 2025 12:54 PM
> To: Yoav Cohen
> Cc: Jens Axboe; linux-block@vger.kernel.org; Uday Shankar; Caleb Sander Mateos
> Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
>
> External email: Use caution opening links or attachments
>
>
> Hi Yoav,
>
> On Thu, Jun 05, 2025 at 12:37:01PM +0000, Yoav Cohen wrote:
> > Hi Ming,
> >
> > Thank you for that.
> > Can you please clarify this
> > +/* Wait until each hw queue has at least one idle IO */
> > what do you exactly wait here? and why it is per io queue?
> > As I understand if the wait timedout the operation will be canceled.
>
> One idle IO means one active io_uring cmd, so we can use this command
> for notifying ublk server. Otherwise, ublk server may not get chance to
> know the quiesce action.
>
> Because each queue may have standalone task context.
>
> The condition should be satisfied easily in any implementation, but please
> let me if it could be one issue in your ublk server implementation.
>
> Big reason is that ublk doesn't have one such admin command for
> housekeeping.
>
> Thanks,
> Ming
>
>
> >
> > Thank you
> >
> > ________________________________________
> > From: Ming Lei <ming.lei@redhat.com>
> > Sent: Thursday, May 22, 2025 7:35 PM
> > To: Jens Axboe; linux-block@vger.kernel.org
> > Cc: Uday Shankar; Caleb Sander Mateos; Yoav Cohen; Ming Lei
> > Subject: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
> > for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
> > or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
> > cooperation.
> >
> > This feature can help to support to upgrade ublk server application by
> > shutting down ublk server gracefully, meantime keep ublk block device
> > persistent during the upgrading period.
> >
> > The feature is only available for UBLK_F_USER_RECOVERY.
> >
> > Suggested-by: Yoav Cohen <yoav@nvidia.com>
> > Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 124 +++++++++++++++++++++++++++++++++-
> > include/uapi/linux/ublk_cmd.h | 19 ++++++
> > 2 files changed, 142 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index fbd075807525..6f51072776f1 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -51,6 +51,7 @@
> > /* private ioctl command mirror */
> > #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
> > #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE)
> > +#define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV)
> >
> > #define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> > #define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> > @@ -67,7 +68,8 @@
> > | UBLK_F_ZONED \
> > | UBLK_F_USER_RECOVERY_FAIL_IO \
> > | UBLK_F_UPDATE_SIZE \
> > - | UBLK_F_AUTO_BUF_REG)
> > + | UBLK_F_AUTO_BUF_REG \
> > + | UBLK_F_QUIESCE)
> >
> > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> > | UBLK_F_USER_RECOVERY_REISSUE \
> > @@ -2841,6 +2843,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > return -EINVAL;
> > }
> >
> > + if ((info.flags & UBLK_F_QUIESCE) && !(info.flags & UBLK_F_USER_RECOVERY)) {
> > + pr_warn("UBLK_F_QUIESCE requires UBLK_F_USER_RECOVERY\n");
> > + return -EINVAL;
> > + }
> > +
> > /*
> > * unprivileged device can't be trusted, but RECOVERY and
> > * RECOVERY_REISSUE still may hang error handling, so can't
> > @@ -3233,6 +3240,117 @@ static void ublk_ctrl_set_size(struct ublk_device *ub, const struct ublksrv_ctrl
> > set_capacity_and_notify(ub->ub_disk, p->dev_sectors);
> > mutex_unlock(&ub->mutex);
> > }
> > +
> > +struct count_busy {
> > + const struct ublk_queue *ubq;
> > + unsigned int nr_busy;
> > +};
> > +
> > +static bool ublk_count_busy_req(struct request *rq, void *data)
> > +{
> > + struct count_busy *idle = data;
> > +
> > + if (!blk_mq_request_started(rq) && rq->mq_hctx->driver_data == idle->ubq)
> > + idle->nr_busy += 1;
> > + return true;
> > +}
> > +
> > +/* uring_cmd is guaranteed to be active if the associated request is idle */
> > +static bool ubq_has_idle_io(const struct ublk_queue *ubq)
> > +{
> > + struct count_busy data = {
> > + .ubq = ubq,
> > + };
> > +
> > + blk_mq_tagset_busy_iter(&ubq->dev->tag_set, ublk_count_busy_req, &data);
> > + return data.nr_busy < ubq->q_depth;
> > +}
> > +
> > +/* Wait until each hw queue has at least one idle IO */
> > +static int ublk_wait_for_idle_io(struct ublk_device *ub,
> > + unsigned int timeout_ms)
> > +{
> > + unsigned int elapsed = 0;
> > + int ret;
> > +
> > + while (elapsed < timeout_ms && !signal_pending(current)) {
> > + unsigned int queues_cancelable = 0;
> > + int i;
> > +
> > + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> > + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> > +
> > + queues_cancelable += !!ubq_has_idle_io(ubq);
> > + }
> > +
> > + /*
> > + * Each queue needs at least one active command for
> > + * notifying ublk server
> > + */
> > + if (queues_cancelable == ub->dev_info.nr_hw_queues)
> > + break;
> > +
> > + msleep(UBLK_REQUEUE_DELAY_MS);
> > + elapsed += UBLK_REQUEUE_DELAY_MS;
> > + }
> > +
> > + if (signal_pending(current))
> > + ret = -EINTR;
> > + else if (elapsed >= timeout_ms)
> > + ret = -EBUSY;
> > + else
> > + ret = 0;
> > +
> > + return ret;
> > +}
> > +
> > +static int ublk_ctrl_quiesce_dev(struct ublk_device *ub,
> > + const struct ublksrv_ctrl_cmd *header)
> > +{
> > + /* zero means wait forever */
> > + u64 timeout_ms = header->data[0];
> > + struct gendisk *disk;
> > + int i, ret = -ENODEV;
> > +
> > + if (!(ub->dev_info.flags & UBLK_F_QUIESCE))
> > + return -EOPNOTSUPP;
> > +
> > + mutex_lock(&ub->mutex);
> > + disk = ublk_get_disk(ub);
> > + if (!disk)
> > + goto unlock;
> > + if (ub->dev_info.state == UBLK_S_DEV_DEAD)
> > + goto put_disk;
> > +
> > + ret = 0;
> > + /* already in expected state */
> > + if (ub->dev_info.state != UBLK_S_DEV_LIVE)
> > + goto put_disk;
> > +
> > + /* Mark all queues as canceling */
> > + blk_mq_quiesce_queue(disk->queue);
> > + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> > + struct ublk_queue *ubq = ublk_get_queue(ub, i);
> > +
> > + ubq->canceling = true;
> > + }
> > + blk_mq_unquiesce_queue(disk->queue);
> > +
> > + if (!timeout_ms)
> > + timeout_ms = UINT_MAX;
> > + ret = ublk_wait_for_idle_io(ub, timeout_ms);
> > +
> > +put_disk:
> > + ublk_put_disk(disk);
> > +unlock:
> > + mutex_unlock(&ub->mutex);
> > +
> > + /* Cancel pending uring_cmd */
> > + if (!ret)
> > + ublk_cancel_dev(ub);
> > + return ret;
> > +}
> > +
> > /*
> > * All control commands are sent via /dev/ublk-control, so we have to check
> > * the destination device's permission
> > @@ -3319,6 +3437,7 @@ static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub,
> > case UBLK_CMD_START_USER_RECOVERY:
> > case UBLK_CMD_END_USER_RECOVERY:
> > case UBLK_CMD_UPDATE_SIZE:
> > + case UBLK_CMD_QUIESCE_DEV:
> > mask = MAY_READ | MAY_WRITE;
> > break;
> > default:
> > @@ -3414,6 +3533,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
> > ublk_ctrl_set_size(ub, header);
> > ret = 0;
> > break;
> > + case UBLK_CMD_QUIESCE_DEV:
> > + ret = ublk_ctrl_quiesce_dev(ub, header);
> > + break;
> > default:
> > ret = -EOPNOTSUPP;
> > break;
> > diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> > index 1c40632cb164..56c7e3fc666f 100644
> > --- a/include/uapi/linux/ublk_cmd.h
> > +++ b/include/uapi/linux/ublk_cmd.h
> > @@ -53,6 +53,8 @@
> > _IOR('u', 0x14, struct ublksrv_ctrl_cmd)
> > #define UBLK_U_CMD_UPDATE_SIZE \
> > _IOWR('u', 0x15, struct ublksrv_ctrl_cmd)
> > +#define UBLK_U_CMD_QUIESCE_DEV \
> > + _IOWR('u', 0x16, struct ublksrv_ctrl_cmd)
> >
> > /*
> > * 64bits are enough now, and it should be easy to extend in case of
> > @@ -253,6 +255,23 @@
> > */
> > #define UBLK_F_AUTO_BUF_REG (1ULL << 11)
> >
> > +/*
> > + * Control command `UBLK_U_CMD_QUIESCE_DEV` is added for quiescing device,
> > + * which state can be transitioned to `UBLK_S_DEV_QUIESCED` or
> > + * `UBLK_S_DEV_FAIL_IO` finally, and it needs ublk server cooperation for
> > + * handling `UBLK_IO_RES_ABORT` correctly.
> > + *
> > + * Typical use case is for supporting to upgrade ublk server application,
> > + * meantime keep ublk block device persistent during the period.
> > + *
> > + * This feature is only available when UBLK_F_USER_RECOVERY is enabled.
> > + *
> > + * Note, this command returns -EBUSY in case that all IO commands are being
> > + * handled by ublk server and not completed in specified time period which
> > + * is passed from the control command parameter.
> > + */
> > +#define UBLK_F_QUIESCE (1ULL << 12)
> > +
> > /* device state */
> > #define UBLK_S_DEV_DEAD 0
> > #define UBLK_S_DEV_LIVE 1
> > --
> > 2.47.0
> >
>
> --
> Ming
>
--
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-12 8:17 ` Yoav Cohen
@ 2025-06-12 9:15 ` Ming Lei
2025-06-12 12:04 ` Yoav Cohen
0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2025-06-12 9:15 UTC (permalink / raw)
To: Yoav Cohen
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
On Thu, Jun 12, 2025 at 08:17:49AM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you very much, I managed to integrate the feature to our application and it seems to work perfectly fine during our update tests.
> Just a double check: when UBLK_F_USER_RECOVERY & UBLK_F_USER_RECOVERY_REISSUE
> and QUIECE_DEV was called - does any IO that will be completed using COMMIT_AND_FETCH with a failure (i.e result=-EIO) will be retry after the recovery stage?
>
UBLK_F_USER_RECOVERY_REISSUE supposes all inflight IOs are failed, so
these IOs will be re-delivered to ublk server after recovering to LIVE.
So you needn't to complete the IOs with -EIO for retrying them during
recovery, in short:
- if ublk server handles inflight IOs, complete them by sending COMMIT_AND_FETCH
with result before closing '/dev/ublkcN', and these IOs will not be re-issued
by driver after recovery
- otherwise, just ignore & discard inflight IOs, they all will be
re-issued by driver after recovery via UBLK_F_USER_RECOVERY_REISSUE.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-12 9:15 ` Ming Lei
@ 2025-06-12 12:04 ` Yoav Cohen
2025-06-23 2:13 ` Ming Lei
0 siblings, 1 reply; 13+ messages in thread
From: Yoav Cohen @ 2025-06-12 12:04 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
Hi Ming,
So I know it's a radical situation but my only concern is that:
0) On our application timeout of IO may be set to few minutes as it is goes over the network.
1) Let's assume we have 1 queue with QD=1.
2) the Only IO is in the userspace application but as we send the IOs over the network it may be stuck due to connectivy issues.
3) User trying to upgrade/stop the application so we issue Quiesce_DEV with infinite timeout as we want to ensure it works.
4) We are stuck now until network connection will recover or Our datapath will somehow Issue the COMMIT_AND_FETCH back to to the kernel so it we can get the ABORT later and QUICESE_DEV can finish.
Problem is that I don't want to wait for this IO until recovery but on the other end I don't want to complete the IO with error to the user.
So on this case I guess we can abort the application or something but maybe it will be cleaner that on Quiesce_DEV will need to issue another SQE per queue or something so we can notify the application this way about it.
Anyway also on our case it will be super rare to happen where there is a queue without an idle operation + network is currently down but we try to be complete as possible.
What do you think?
Thank you
________________________________________
From: Ming Lei <ming.lei@redhat.com>
Sent: Thursday, June 12, 2025 12:15 PM
To: Yoav Cohen
Cc: Jens Axboe; linux-block@vger.kernel.org; Uday Shankar; Caleb Sander Mateos
Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
External email: Use caution opening links or attachments
On Thu, Jun 12, 2025 at 08:17:49AM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you very much, I managed to integrate the feature to our application and it seems to work perfectly fine during our update tests.
> Just a double check: when UBLK_F_USER_RECOVERY & UBLK_F_USER_RECOVERY_REISSUE
> and QUIECE_DEV was called - does any IO that will be completed using COMMIT_AND_FETCH with a failure (i.e result=-EIO) will be retry after the recovery stage?
>
UBLK_F_USER_RECOVERY_REISSUE supposes all inflight IOs are failed, so
these IOs will be re-delivered to ublk server after recovering to LIVE.
So you needn't to complete the IOs with -EIO for retrying them during
recovery, in short:
- if ublk server handles inflight IOs, complete them by sending COMMIT_AND_FETCH
with result before closing '/dev/ublkcN', and these IOs will not be re-issued
by driver after recovery
- otherwise, just ignore & discard inflight IOs, they all will be
re-issued by driver after recovery via UBLK_F_USER_RECOVERY_REISSUE.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE
2025-06-12 12:04 ` Yoav Cohen
@ 2025-06-23 2:13 ` Ming Lei
0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2025-06-23 2:13 UTC (permalink / raw)
To: Yoav Cohen
Cc: Jens Axboe, linux-block@vger.kernel.org, Uday Shankar,
Caleb Sander Mateos
On Thu, Jun 12, 2025 at 12:04:39PM +0000, Yoav Cohen wrote:
>
> Hi Ming,
>
> So I know it's a radical situation but my only concern is that:
>
> 0) On our application timeout of IO may be set to few minutes as it is goes over the network.
> 1) Let's assume we have 1 queue with QD=1.
> 2) the Only IO is in the userspace application but as we send the IOs over the network it may be stuck due to connectivy issues.
> 3) User trying to upgrade/stop the application so we issue Quiesce_DEV with infinite timeout as we want to ensure it works.
> 4) We are stuck now until network connection will recover or Our datapath will somehow Issue the COMMIT_AND_FETCH back to to the kernel so it we can get the ABORT later and QUICESE_DEV can finish.
>
> Problem is that I don't want to wait for this IO until recovery but on the other end I don't want to complete the IO with error to the user.
> So on this case I guess we can abort the application or something but maybe it will be cleaner that on Quiesce_DEV will need to issue another SQE per queue or something so we can notify the application this way about it.
>
> Anyway also on our case it will be super rare to happen where there is a queue without an idle operation + network is currently down but we try to be complete as possible.
> What do you think?
Hi Yoav,
The multishot approach for fetching io command should address the issue
wrt. single queue depth case:
https://github.com/ming1/linux/commits/ublk2-cmd-batch.v3/
In which there is always one multishot FETCH_IO_CMDS for notifying ublk
server for new io commands(requests) in batch way.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-23 2:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 16:35 [PATCH 0/3] ublk: add UBLK_F_QUIESCE Ming Lei
2025-05-22 16:35 ` [PATCH 1/3] selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE Ming Lei
2025-05-22 16:35 ` [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE Ming Lei
2025-06-05 12:37 ` Yoav Cohen
2025-06-06 9:54 ` Ming Lei
2025-06-08 10:20 ` Yoav Cohen
2025-06-08 14:34 ` Ming Lei
2025-06-12 8:17 ` Yoav Cohen
2025-06-12 9:15 ` Ming Lei
2025-06-12 12:04 ` Yoav Cohen
2025-06-23 2:13 ` Ming Lei
2025-05-22 16:35 ` [PATCH 3/3] selftests: ublk: add test for UBLK_F_QUIESCE Ming Lei
2025-05-23 15:42 ` [PATCH 0/3] ublk: add UBLK_F_QUIESCE Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox