* [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-04-26 9:41 [PATCH 0/4] ublk: two fixes and support UBLK_F_AUTO_ZERO_COPY Ming Lei
@ 2025-04-26 9:41 ` Ming Lei
2025-04-26 20:15 ` Caleb Sander Mateos
2025-04-26 9:41 ` [PATCH 2/4] ublk: enhance check for register/unregister io buffer command Ming Lei
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-26 9:41 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Uday Shankar, Caleb Sander Mateos, Ming Lei
Commit 57e13a2e8cd2 ("selftests: ublk: support user recovery") starts to
support UBLK_F_NEED_GET_DATA for covering recovery feature, however the
ublk utility implementation isn't done correctly.
Fix it by supporting UBLK_F_NEED_GET_DATA correctly.
Also add test generic_07 for covering UBLK_F_NEED_GET_DATA.
Fixes: 57e13a2e8cd2 ("selftests: ublk: support user recovery")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 11 +++++---
tools/testing/selftests/ublk/kublk.h | 1 +
.../testing/selftests/ublk/test_generic_07.sh | 25 +++++++++++++++++++
4 files changed, 35 insertions(+), 3 deletions(-)
create mode 100755 tools/testing/selftests/ublk/test_generic_07.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index ec4624a283bc..f34ac0bac696 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -9,6 +9,7 @@ TEST_PROGS += test_generic_03.sh
TEST_PROGS += test_generic_04.sh
TEST_PROGS += test_generic_05.sh
TEST_PROGS += test_generic_06.sh
+TEST_PROGS += test_generic_07.sh
TEST_PROGS += test_null_01.sh
TEST_PROGS += test_null_02.sh
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index e57a1486bb48..701b47f98902 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -538,10 +538,12 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
/* we issue because we need either fetching or committing */
if (!(io->flags &
- (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP)))
+ (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_NEED_GET_DATA)))
return 0;
- if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
+ if (io->flags & UBLKSRV_NEED_GET_DATA)
+ cmd_op = UBLK_U_IO_NEED_GET_DATA;
+ else if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
cmd_op = UBLK_U_IO_FETCH_REQ;
@@ -658,6 +660,9 @@ static void ublk_handle_cqe(struct io_uring *r,
assert(tag < q->q_depth);
if (q->tgt_ops->queue_io)
q->tgt_ops->queue_io(q, tag);
+ } else if (cqe->res == UBLK_IO_RES_NEED_GET_DATA) {
+ io->flags |= UBLKSRV_NEED_GET_DATA | UBLKSRV_IO_FREE;
+ ublk_queue_io_cmd(q, io, tag);
} else {
/*
* COMMIT_REQ will be completed immediately since no fetching
@@ -1313,7 +1318,7 @@ int main(int argc, char *argv[])
opterr = 0;
optind = 2;
- while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:az",
+ while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:g:az",
longopts, &option_idx)) != -1) {
switch (opt) {
case 'a':
diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
index 918db5cd633f..44ee1e4ac55b 100644
--- a/tools/testing/selftests/ublk/kublk.h
+++ b/tools/testing/selftests/ublk/kublk.h
@@ -115,6 +115,7 @@ struct ublk_io {
#define UBLKSRV_NEED_FETCH_RQ (1UL << 0)
#define UBLKSRV_NEED_COMMIT_RQ_COMP (1UL << 1)
#define UBLKSRV_IO_FREE (1UL << 2)
+#define UBLKSRV_NEED_GET_DATA (1UL << 3)
unsigned short flags;
unsigned short refs; /* used by target code only */
diff --git a/tools/testing/selftests/ublk/test_generic_07.sh b/tools/testing/selftests/ublk/test_generic_07.sh
new file mode 100755
index 000000000000..5d82b5955006
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_07.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_07"
+ERR_CODE=0
+
+_prep_test "generic" "test UBLK_F_NEED_GET_DATA"
+
+_create_backfile 0 256M
+dev_id=$(_add_ublk_dev -t loop -q 2 -g 1 "${UBLK_BACKFILES[0]}")
+_check_add_dev $TID $?
+
+# run fio over the ublk disk
+if ! _run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M; then
+ _cleanup_test "generic"
+ _show_result $TID 255
+fi
+
+_mkfs_mount_test /dev/ublkb"${dev_id}"
+ERR_CODE=$?
+
+_cleanup_test "generic"
+_show_result $TID $ERR_CODE
--
2.47.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-04-26 9:41 ` [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA Ming Lei
@ 2025-04-26 20:15 ` Caleb Sander Mateos
2025-04-27 1:26 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-04-26 20:15 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Commit 57e13a2e8cd2 ("selftests: ublk: support user recovery") starts to
> support UBLK_F_NEED_GET_DATA for covering recovery feature, however the
> ublk utility implementation isn't done correctly.
>
> Fix it by supporting UBLK_F_NEED_GET_DATA correctly.
>
> Also add test generic_07 for covering UBLK_F_NEED_GET_DATA.
Looks good to me, just a few minor comments.
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
>
> Fixes: 57e13a2e8cd2 ("selftests: ublk: support user recovery")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> tools/testing/selftests/ublk/Makefile | 1 +
> tools/testing/selftests/ublk/kublk.c | 11 +++++---
> tools/testing/selftests/ublk/kublk.h | 1 +
> .../testing/selftests/ublk/test_generic_07.sh | 25 +++++++++++++++++++
> 4 files changed, 35 insertions(+), 3 deletions(-)
> create mode 100755 tools/testing/selftests/ublk/test_generic_07.sh
>
> diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
> index ec4624a283bc..f34ac0bac696 100644
> --- a/tools/testing/selftests/ublk/Makefile
> +++ b/tools/testing/selftests/ublk/Makefile
> @@ -9,6 +9,7 @@ TEST_PROGS += test_generic_03.sh
> TEST_PROGS += test_generic_04.sh
> TEST_PROGS += test_generic_05.sh
> TEST_PROGS += test_generic_06.sh
> +TEST_PROGS += test_generic_07.sh
>
> TEST_PROGS += test_null_01.sh
> TEST_PROGS += test_null_02.sh
> diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> index e57a1486bb48..701b47f98902 100644
> --- a/tools/testing/selftests/ublk/kublk.c
> +++ b/tools/testing/selftests/ublk/kublk.c
> @@ -538,10 +538,12 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
>
> /* we issue because we need either fetching or committing */
> if (!(io->flags &
> - (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP)))
> + (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_NEED_GET_DATA)))
Comment could use an update
> return 0;
>
> - if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
> + if (io->flags & UBLKSRV_NEED_GET_DATA)
> + cmd_op = UBLK_U_IO_NEED_GET_DATA;
> + else if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
> cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
> else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
> cmd_op = UBLK_U_IO_FETCH_REQ;
> @@ -658,6 +660,9 @@ static void ublk_handle_cqe(struct io_uring *r,
> assert(tag < q->q_depth);
> if (q->tgt_ops->queue_io)
> q->tgt_ops->queue_io(q, tag);
> + } else if (cqe->res == UBLK_IO_RES_NEED_GET_DATA) {
> + io->flags |= UBLKSRV_NEED_GET_DATA | UBLKSRV_IO_FREE;
> + ublk_queue_io_cmd(q, io, tag);
> } else {
> /*
> * COMMIT_REQ will be completed immediately since no fetching
> @@ -1313,7 +1318,7 @@ int main(int argc, char *argv[])
>
> opterr = 0;
> optind = 2;
> - while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:az",
> + while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:g:az",
> longopts, &option_idx)) != -1) {
It's a little strange for -g to take an argument since it's basically
a boolean flag. But it looks like several other flags behave the same
way.
> switch (opt) {
> case 'a':
> diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
> index 918db5cd633f..44ee1e4ac55b 100644
> --- a/tools/testing/selftests/ublk/kublk.h
> +++ b/tools/testing/selftests/ublk/kublk.h
> @@ -115,6 +115,7 @@ struct ublk_io {
> #define UBLKSRV_NEED_FETCH_RQ (1UL << 0)
> #define UBLKSRV_NEED_COMMIT_RQ_COMP (1UL << 1)
> #define UBLKSRV_IO_FREE (1UL << 2)
> +#define UBLKSRV_NEED_GET_DATA (1UL << 3)
> unsigned short flags;
> unsigned short refs; /* used by target code only */
>
> diff --git a/tools/testing/selftests/ublk/test_generic_07.sh b/tools/testing/selftests/ublk/test_generic_07.sh
> new file mode 100755
> index 000000000000..5d82b5955006
> --- /dev/null
> +++ b/tools/testing/selftests/ublk/test_generic_07.sh
> @@ -0,0 +1,25 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
> +
> +TID="generic_07"
> +ERR_CODE=0
> +
> +_prep_test "generic" "test UBLK_F_NEED_GET_DATA"
> +
> +_create_backfile 0 256M
> +dev_id=$(_add_ublk_dev -t loop -q 2 -g 1 "${UBLK_BACKFILES[0]}")
> +_check_add_dev $TID $?
> +
> +# run fio over the ublk disk
> +if ! _run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M; then
> + _cleanup_test "generic"
> + _show_result $TID 255
Propagate the return code from fio?
> +fi
All the other tests that use _run_fio_verify_io appear to check
_have_program fio first. Is that necessary here too?
> +
> +_mkfs_mount_test /dev/ublkb"${dev_id}"
> +ERR_CODE=$?
> +
> +_cleanup_test "generic"
> +_show_result $TID $ERR_CODE
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-04-26 20:15 ` Caleb Sander Mateos
@ 2025-04-27 1:26 ` Ming Lei
0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2025-04-27 1:26 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 01:15:58PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Commit 57e13a2e8cd2 ("selftests: ublk: support user recovery") starts to
> > support UBLK_F_NEED_GET_DATA for covering recovery feature, however the
> > ublk utility implementation isn't done correctly.
> >
> > Fix it by supporting UBLK_F_NEED_GET_DATA correctly.
> >
> > Also add test generic_07 for covering UBLK_F_NEED_GET_DATA.
>
> Looks good to me, just a few minor comments.
>
> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
>
> >
> > Fixes: 57e13a2e8cd2 ("selftests: ublk: support user recovery")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > tools/testing/selftests/ublk/Makefile | 1 +
> > tools/testing/selftests/ublk/kublk.c | 11 +++++---
> > tools/testing/selftests/ublk/kublk.h | 1 +
> > .../testing/selftests/ublk/test_generic_07.sh | 25 +++++++++++++++++++
> > 4 files changed, 35 insertions(+), 3 deletions(-)
> > create mode 100755 tools/testing/selftests/ublk/test_generic_07.sh
> >
> > diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
> > index ec4624a283bc..f34ac0bac696 100644
> > --- a/tools/testing/selftests/ublk/Makefile
> > +++ b/tools/testing/selftests/ublk/Makefile
> > @@ -9,6 +9,7 @@ TEST_PROGS += test_generic_03.sh
> > TEST_PROGS += test_generic_04.sh
> > TEST_PROGS += test_generic_05.sh
> > TEST_PROGS += test_generic_06.sh
> > +TEST_PROGS += test_generic_07.sh
> >
> > TEST_PROGS += test_null_01.sh
> > TEST_PROGS += test_null_02.sh
> > diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> > index e57a1486bb48..701b47f98902 100644
> > --- a/tools/testing/selftests/ublk/kublk.c
> > +++ b/tools/testing/selftests/ublk/kublk.c
> > @@ -538,10 +538,12 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
> >
> > /* we issue because we need either fetching or committing */
> > if (!(io->flags &
> > - (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP)))
> > + (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_NEED_GET_DATA)))
>
> Comment could use an update
>
> > return 0;
> >
> > - if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
> > + if (io->flags & UBLKSRV_NEED_GET_DATA)
> > + cmd_op = UBLK_U_IO_NEED_GET_DATA;
> > + else if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
> > cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
> > else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
> > cmd_op = UBLK_U_IO_FETCH_REQ;
> > @@ -658,6 +660,9 @@ static void ublk_handle_cqe(struct io_uring *r,
> > assert(tag < q->q_depth);
> > if (q->tgt_ops->queue_io)
> > q->tgt_ops->queue_io(q, tag);
> > + } else if (cqe->res == UBLK_IO_RES_NEED_GET_DATA) {
> > + io->flags |= UBLKSRV_NEED_GET_DATA | UBLKSRV_IO_FREE;
> > + ublk_queue_io_cmd(q, io, tag);
> > } else {
> > /*
> > * COMMIT_REQ will be completed immediately since no fetching
> > @@ -1313,7 +1318,7 @@ int main(int argc, char *argv[])
> >
> > opterr = 0;
> > optind = 2;
> > - while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:az",
> > + while ((opt = getopt_long(argc, argv, "t:n:d:q:r:e:i:g:az",
> > longopts, &option_idx)) != -1) {
>
> It's a little strange for -g to take an argument since it's basically
> a boolean flag. But it looks like several other flags behave the same
> way.
Yeah, we can make it one bool flag.
>
> > switch (opt) {
> > case 'a':
> > diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
> > index 918db5cd633f..44ee1e4ac55b 100644
> > --- a/tools/testing/selftests/ublk/kublk.h
> > +++ b/tools/testing/selftests/ublk/kublk.h
> > @@ -115,6 +115,7 @@ struct ublk_io {
> > #define UBLKSRV_NEED_FETCH_RQ (1UL << 0)
> > #define UBLKSRV_NEED_COMMIT_RQ_COMP (1UL << 1)
> > #define UBLKSRV_IO_FREE (1UL << 2)
> > +#define UBLKSRV_NEED_GET_DATA (1UL << 3)
> > unsigned short flags;
> > unsigned short refs; /* used by target code only */
> >
> > diff --git a/tools/testing/selftests/ublk/test_generic_07.sh b/tools/testing/selftests/ublk/test_generic_07.sh
> > new file mode 100755
> > index 000000000000..5d82b5955006
> > --- /dev/null
> > +++ b/tools/testing/selftests/ublk/test_generic_07.sh
> > @@ -0,0 +1,25 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
> > +
> > +TID="generic_07"
> > +ERR_CODE=0
> > +
> > +_prep_test "generic" "test UBLK_F_NEED_GET_DATA"
> > +
> > +_create_backfile 0 256M
> > +dev_id=$(_add_ublk_dev -t loop -q 2 -g 1 "${UBLK_BACKFILES[0]}")
> > +_check_add_dev $TID $?
> > +
> > +# run fio over the ublk disk
> > +if ! _run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M; then
> > + _cleanup_test "generic"
> > + _show_result $TID 255
>
> Propagate the return code from fio?
OK.
>
> > +fi
>
> All the other tests that use _run_fio_verify_io appear to check
> _have_program fio first. Is that necessary here too?
Will add the check.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/4] ublk: enhance check for register/unregister io buffer command
2025-04-26 9:41 [PATCH 0/4] ublk: two fixes and support UBLK_F_AUTO_ZERO_COPY Ming Lei
2025-04-26 9:41 ` [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA Ming Lei
@ 2025-04-26 9:41 ` Ming Lei
2025-04-26 20:38 ` Caleb Sander Mateos
2025-04-26 9:41 ` [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY Ming Lei
2025-04-26 9:41 ` [PATCH 4/4] selftests: ublk: support UBLK_F_AUTO_ZERO_COPY Ming Lei
3 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-26 9:41 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Uday Shankar, Caleb Sander Mateos, Ming Lei
The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect
register/unregister io buffer easily, so check it before calling
starting to register/un-register io buffer.
Also only allow io buffer register/unregister uring_cmd in case of
UBLK_F_SUPPORT_ZERO_COPY.
Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 40f971a66d3e..347790b3a633 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -609,6 +609,11 @@ static void ublk_apply_params(struct ublk_device *ub)
ublk_dev_param_zoned_apply(ub);
}
+static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
+{
+ return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
+}
+
static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
{
return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
@@ -1950,9 +1955,16 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
unsigned int index, unsigned int issue_flags)
{
struct ublk_device *ub = cmd->file->private_data;
+ struct ublk_io *io = &ubq->ios[tag];
struct request *req;
int ret;
+ if (!ublk_support_zero_copy(ubq))
+ return -EINVAL;
+
+ if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
+ return -EINVAL;
+
req = __ublk_check_and_get_req(ub, ubq, tag, 0);
if (!req)
return -EINVAL;
@@ -1968,8 +1980,17 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
}
static int ublk_unregister_io_buf(struct io_uring_cmd *cmd,
+ struct ublk_queue *ubq, unsigned int tag,
unsigned int index, unsigned int issue_flags)
{
+ struct ublk_io *io = &ubq->ios[tag];
+
+ if (!ublk_support_zero_copy(ubq))
+ return -EINVAL;
+
+ if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
+ return -EINVAL;
+
return io_buffer_unregister_bvec(cmd, index, issue_flags);
}
@@ -2073,7 +2094,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
case UBLK_IO_REGISTER_IO_BUF:
return ublk_register_io_buf(cmd, ubq, tag, ub_cmd->addr, issue_flags);
case UBLK_IO_UNREGISTER_IO_BUF:
- return ublk_unregister_io_buf(cmd, ub_cmd->addr, issue_flags);
+ return ublk_unregister_io_buf(cmd, ubq, tag, ub_cmd->addr, issue_flags);
case UBLK_IO_FETCH_REQ:
ret = ublk_fetch(cmd, ubq, io, ub_cmd->addr);
if (ret)
--
2.47.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 2/4] ublk: enhance check for register/unregister io buffer command
2025-04-26 9:41 ` [PATCH 2/4] ublk: enhance check for register/unregister io buffer command Ming Lei
@ 2025-04-26 20:38 ` Caleb Sander Mateos
2025-04-27 1:37 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-04-26 20:38 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect
> register/unregister io buffer easily, so check it before calling
> starting to register/un-register io buffer.
>
> Also only allow io buffer register/unregister uring_cmd in case of
> UBLK_F_SUPPORT_ZERO_COPY.
Indeed, both these checks make sense. (Hopefully there aren't any
applications depending on the ability to use ublk zero-copy without
setting the flag.) I too was thinking of adding the
UBLK_IO_FLAG_OWNED_BY_SRV check because it could allow the
kref_get_unless_zero() to be replaced with the cheaper kref_get(). I
think the checks could be split into 2 separate commits, but up to
you.
>
> Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 40f971a66d3e..347790b3a633 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -609,6 +609,11 @@ static void ublk_apply_params(struct ublk_device *ub)
> ublk_dev_param_zoned_apply(ub);
> }
>
> +static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> +{
> + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> +}
> +
> static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> {
> return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> @@ -1950,9 +1955,16 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> unsigned int index, unsigned int issue_flags)
> {
> struct ublk_device *ub = cmd->file->private_data;
> + struct ublk_io *io = &ubq->ios[tag];
I thought you had mentioned in
https://lore.kernel.org/linux-block/aAmYJxaV1-yWEMRo@fedora/ wanting
to the ability to offload the ublk zero-copy buffer registration to a
thread other than ubq_daemon. Are you still planning to do that, or
does the "auto-register" feature supplant the need for that? Accessing
the ublk_io here only seems safe when on the ubq_daemon thread.
> struct request *req;
> int ret;
>
> + if (!ublk_support_zero_copy(ubq))
> + return -EINVAL;
> +
> + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> + return -EINVAL;
Every opcode except UBLK_IO_FETCH_REQ now checks io->flags &
UBLK_IO_FLAG_OWNED_BY_SRV. Maybe it would make sense to lift the check
up to __ublk_ch_uring_cmd() to avoid duplicating it?
Best,
Caleb
> +
> req = __ublk_check_and_get_req(ub, ubq, tag, 0);
> if (!req)
> return -EINVAL;
> @@ -1968,8 +1980,17 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> }
>
> static int ublk_unregister_io_buf(struct io_uring_cmd *cmd,
> + struct ublk_queue *ubq, unsigned int tag,
> unsigned int index, unsigned int issue_flags)
> {
> + struct ublk_io *io = &ubq->ios[tag];
> +
> + if (!ublk_support_zero_copy(ubq))
> + return -EINVAL;
> +
> + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> + return -EINVAL;
> +
> return io_buffer_unregister_bvec(cmd, index, issue_flags);
> }
>
> @@ -2073,7 +2094,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> case UBLK_IO_REGISTER_IO_BUF:
> return ublk_register_io_buf(cmd, ubq, tag, ub_cmd->addr, issue_flags);
> case UBLK_IO_UNREGISTER_IO_BUF:
> - return ublk_unregister_io_buf(cmd, ub_cmd->addr, issue_flags);
> + return ublk_unregister_io_buf(cmd, ubq, tag, ub_cmd->addr, issue_flags);
> case UBLK_IO_FETCH_REQ:
> ret = ublk_fetch(cmd, ubq, io, ub_cmd->addr);
> if (ret)
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 2/4] ublk: enhance check for register/unregister io buffer command
2025-04-26 20:38 ` Caleb Sander Mateos
@ 2025-04-27 1:37 ` Ming Lei
2025-04-27 3:14 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-27 1:37 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 01:38:14PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect
> > register/unregister io buffer easily, so check it before calling
> > starting to register/un-register io buffer.
> >
> > Also only allow io buffer register/unregister uring_cmd in case of
> > UBLK_F_SUPPORT_ZERO_COPY.
>
> Indeed, both these checks make sense. (Hopefully there aren't any
> applications depending on the ability to use ublk zero-copy without
> setting the flag.) I too was thinking of adding the
> UBLK_IO_FLAG_OWNED_BY_SRV check because it could allow the
> kref_get_unless_zero() to be replaced with the cheaper kref_get(). I
> think the checks could be split into 2 separate commits, but up to
> you.
Let's do it in single patch for making everyone easier.
>
> >
> > Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 23 ++++++++++++++++++++++-
> > 1 file changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 40f971a66d3e..347790b3a633 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -609,6 +609,11 @@ static void ublk_apply_params(struct ublk_device *ub)
> > ublk_dev_param_zoned_apply(ub);
> > }
> >
> > +static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > +{
> > + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > +}
> > +
> > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > {
> > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > @@ -1950,9 +1955,16 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> > unsigned int index, unsigned int issue_flags)
> > {
> > struct ublk_device *ub = cmd->file->private_data;
> > + struct ublk_io *io = &ubq->ios[tag];
>
> I thought you had mentioned in
> https://lore.kernel.org/linux-block/aAmYJxaV1-yWEMRo@fedora/ wanting
> to the ability to offload the ublk zero-copy buffer registration to a
> thread other than ubq_daemon. Are you still planning to do that, or
> does the "auto-register" feature supplant the need for that?
The auto-register idea is actually thought of when I was working on ublk
selftest offload function.
If this auto-register feature is supported, it becomes less important to
relax the ubq_daemon limit for register_io_buffer command, then I jump
on this feature & post put the patch.
But I will continue to work on the offload test code and finally relax
the limit for register/unregister io buffer command, hope it can be
done in next week.
> Accessing
> the ublk_io here only seems safe when on the ubq_daemon thread.
Both ublk_register_io_buf()/ublk_unregister_io_buf() just reads ublk_io or
the request buffer only, so it is just fine for the two to run from other
contexts.
>
> > struct request *req;
> > int ret;
> >
> > + if (!ublk_support_zero_copy(ubq))
> > + return -EINVAL;
> > +
> > + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > + return -EINVAL;
>
> Every opcode except UBLK_IO_FETCH_REQ now checks io->flags &
> UBLK_IO_FLAG_OWNED_BY_SRV. Maybe it would make sense to lift the check
> up to __ublk_ch_uring_cmd() to avoid duplicating it?
Good point.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 2/4] ublk: enhance check for register/unregister io buffer command
2025-04-27 1:37 ` Ming Lei
@ 2025-04-27 3:14 ` Caleb Sander Mateos
2025-04-27 3:49 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-04-27 3:14 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 6:37 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Sat, Apr 26, 2025 at 01:38:14PM -0700, Caleb Sander Mateos wrote:
> > On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect
> > > register/unregister io buffer easily, so check it before calling
> > > starting to register/un-register io buffer.
> > >
> > > Also only allow io buffer register/unregister uring_cmd in case of
> > > UBLK_F_SUPPORT_ZERO_COPY.
> >
> > Indeed, both these checks make sense. (Hopefully there aren't any
> > applications depending on the ability to use ublk zero-copy without
> > setting the flag.) I too was thinking of adding the
> > UBLK_IO_FLAG_OWNED_BY_SRV check because it could allow the
> > kref_get_unless_zero() to be replaced with the cheaper kref_get(). I
> > think the checks could be split into 2 separate commits, but up to
> > you.
>
> Let's do it in single patch for making everyone easier.
>
> >
> > >
> > > Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec")
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > drivers/block/ublk_drv.c | 23 ++++++++++++++++++++++-
> > > 1 file changed, 22 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 40f971a66d3e..347790b3a633 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -609,6 +609,11 @@ static void ublk_apply_params(struct ublk_device *ub)
> > > ublk_dev_param_zoned_apply(ub);
> > > }
> > >
> > > +static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > > +{
> > > + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > > +}
> > > +
> > > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > > {
> > > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > @@ -1950,9 +1955,16 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> > > unsigned int index, unsigned int issue_flags)
> > > {
> > > struct ublk_device *ub = cmd->file->private_data;
> > > + struct ublk_io *io = &ubq->ios[tag];
> >
> > I thought you had mentioned in
> > https://lore.kernel.org/linux-block/aAmYJxaV1-yWEMRo@fedora/ wanting
> > to the ability to offload the ublk zero-copy buffer registration to a
> > thread other than ubq_daemon. Are you still planning to do that, or
> > does the "auto-register" feature supplant the need for that?
>
> The auto-register idea is actually thought of when I was working on ublk
> selftest offload function.
>
> If this auto-register feature is supported, it becomes less important to
> relax the ubq_daemon limit for register_io_buffer command, then I jump
> on this feature & post put the patch.
>
> But I will continue to work on the offload test code and finally relax
> the limit for register/unregister io buffer command, hope it can be
> done in next week.
>
> > Accessing
> > the ublk_io here only seems safe when on the ubq_daemon thread.
>
> Both ublk_register_io_buf()/ublk_unregister_io_buf() just reads ublk_io or
> the request buffer only, so it is just fine for the two to run from other
> contexts.
Isn't it racy to check io->flags when it could be concurrently
modified by another thread (the ubq_daemon)?
Best,
Caleb
>
> >
> > > struct request *req;
> > > int ret;
> > >
> > > + if (!ublk_support_zero_copy(ubq))
> > > + return -EINVAL;
> > > +
> > > + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > > + return -EINVAL;
> >
> > Every opcode except UBLK_IO_FETCH_REQ now checks io->flags &
> > UBLK_IO_FLAG_OWNED_BY_SRV. Maybe it would make sense to lift the check
> > up to __ublk_ch_uring_cmd() to avoid duplicating it?
>
> Good point.
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 2/4] ublk: enhance check for register/unregister io buffer command
2025-04-27 3:14 ` Caleb Sander Mateos
@ 2025-04-27 3:49 ` Ming Lei
0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2025-04-27 3:49 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 08:14:18PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 6:37 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Sat, Apr 26, 2025 at 01:38:14PM -0700, Caleb Sander Mateos wrote:
> > > On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect
> > > > register/unregister io buffer easily, so check it before calling
> > > > starting to register/un-register io buffer.
> > > >
> > > > Also only allow io buffer register/unregister uring_cmd in case of
> > > > UBLK_F_SUPPORT_ZERO_COPY.
> > >
> > > Indeed, both these checks make sense. (Hopefully there aren't any
> > > applications depending on the ability to use ublk zero-copy without
> > > setting the flag.) I too was thinking of adding the
> > > UBLK_IO_FLAG_OWNED_BY_SRV check because it could allow the
> > > kref_get_unless_zero() to be replaced with the cheaper kref_get(). I
> > > think the checks could be split into 2 separate commits, but up to
> > > you.
> >
> > Let's do it in single patch for making everyone easier.
> >
> > >
> > > >
> > > > Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec")
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > > drivers/block/ublk_drv.c | 23 ++++++++++++++++++++++-
> > > > 1 file changed, 22 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index 40f971a66d3e..347790b3a633 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -609,6 +609,11 @@ static void ublk_apply_params(struct ublk_device *ub)
> > > > ublk_dev_param_zoned_apply(ub);
> > > > }
> > > >
> > > > +static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > > > +{
> > > > + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > > > +}
> > > > +
> > > > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > > > {
> > > > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > > @@ -1950,9 +1955,16 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> > > > unsigned int index, unsigned int issue_flags)
> > > > {
> > > > struct ublk_device *ub = cmd->file->private_data;
> > > > + struct ublk_io *io = &ubq->ios[tag];
> > >
> > > I thought you had mentioned in
> > > https://lore.kernel.org/linux-block/aAmYJxaV1-yWEMRo@fedora/ wanting
> > > to the ability to offload the ublk zero-copy buffer registration to a
> > > thread other than ubq_daemon. Are you still planning to do that, or
> > > does the "auto-register" feature supplant the need for that?
> >
> > The auto-register idea is actually thought of when I was working on ublk
> > selftest offload function.
> >
> > If this auto-register feature is supported, it becomes less important to
> > relax the ubq_daemon limit for register_io_buffer command, then I jump
> > on this feature & post put the patch.
> >
> > But I will continue to work on the offload test code and finally relax
> > the limit for register/unregister io buffer command, hope it can be
> > done in next week.
> >
> > > Accessing
> > > the ublk_io here only seems safe when on the ubq_daemon thread.
> >
> > Both ublk_register_io_buf()/ublk_unregister_io_buf() just reads ublk_io or
> > the request buffer only, so it is just fine for the two to run from other
> > contexts.
>
> Isn't it racy to check io->flags when it could be concurrently
> modified by another thread (the ubq_daemon)?
Good question!
Yeah, it becomes tricky if registering buffer from other pthread, such as:
- one io handler thread is registering buffer for tag 0 from cpu 0
- UBLK_IO_COMMIT_AND_FETCH_REQ comes on tag 0 from one bad ublk daemon
Then the io handler thread may observe UBLK_IO_FLAG_OWNED_BY_SRV, but
meantime UBLK_IO_COMMIT_AND_FETCH_REQ clears it and completes the request,
and this request may be freed or recycled immediately. Then the io handler
pthread sees wrong request data.
The approach I mentioned in the following link may help to support 'offload':
https://lore.kernel.org/linux-block/aAscRPVcTBiBHNe7@fedora/
The nice thing is that one batch of commands can be delivered via single or
multiple READ_MULTISHOT, and per-queue spin lock can be used. Same with io
command completion side. And it becomes easier to remove the ubq_daemon
constraint with the per-queue lock.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-26 9:41 [PATCH 0/4] ublk: two fixes and support UBLK_F_AUTO_ZERO_COPY Ming Lei
2025-04-26 9:41 ` [PATCH 1/4] selftests: ublk: fix UBLK_F_NEED_GET_DATA Ming Lei
2025-04-26 9:41 ` [PATCH 2/4] ublk: enhance check for register/unregister io buffer command Ming Lei
@ 2025-04-26 9:41 ` Ming Lei
2025-04-26 22:42 ` Caleb Sander Mateos
2025-04-26 9:41 ` [PATCH 4/4] selftests: ublk: support UBLK_F_AUTO_ZERO_COPY Ming Lei
3 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-26 9:41 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Uday Shankar, Caleb Sander Mateos, Ming Lei
UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
register/unregister uring_cmd for each IO, this way is not only inefficient,
but also introduce dependency between buffer consumer and buffer register/
unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
Add feature UBLK_F_AUTO_ZERO_COPY for addressing the existed zero copy
limit:
- register request buffer automatically before delivering io command to
ublk server
- unregister request buffer automatically when completing the request
- io_uring will unregister the buffer automatically when uring is
exiting, so we needn't worry about accident exit
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 87 +++++++++++++++++++++++++++--------
include/uapi/linux/ublk_cmd.h | 20 ++++++++
2 files changed, 89 insertions(+), 18 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 347790b3a633..453767f91431 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -64,7 +64,8 @@
| UBLK_F_CMD_IOCTL_ENCODE \
| UBLK_F_USER_COPY \
| UBLK_F_ZONED \
- | UBLK_F_USER_RECOVERY_FAIL_IO)
+ | UBLK_F_USER_RECOVERY_FAIL_IO \
+ | UBLK_F_AUTO_ZERO_COPY)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -131,6 +132,14 @@ struct ublk_uring_cmd_pdu {
*/
#define UBLK_IO_FLAG_NEED_GET_DATA 0x08
+/*
+ * request buffer is registered automatically, so we have to unregister it
+ * before completing this request.
+ *
+ * io_uring will unregister buffer automatically for us during exiting.
+ */
+#define UBLK_IO_FLAG_UNREG_BUF 0x10
+
/* atomic RW with ubq->cancel_lock */
#define UBLK_IO_FLAG_CANCELED 0x80000000
@@ -207,7 +216,8 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
int tag);
static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
{
- return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
+ return ub->dev_info.flags & (UBLK_F_USER_COPY |
+ UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY);
}
static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
@@ -614,6 +624,11 @@ static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
}
+static inline bool ublk_support_auto_zc(const struct ublk_queue *ubq)
+{
+ return ubq->flags & UBLK_F_AUTO_ZERO_COPY;
+}
+
static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
{
return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
@@ -621,7 +636,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
{
- return !ublk_support_user_copy(ubq);
+ return !ublk_support_user_copy(ubq) && !ublk_support_auto_zc(ubq);
}
static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
@@ -629,17 +644,21 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
/*
* read()/write() is involved in user copy, so request reference
* has to be grabbed
+ *
+ * For auto zc, ublk server still may issue UBLK_IO_COMMIT_AND_FETCH_REQ
+ * before one registered buffer is used up, so reference is
+ * required too.
*/
- return ublk_support_user_copy(ubq);
+ return ublk_support_user_copy(ubq) || ublk_support_auto_zc(ubq);
}
static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
- struct request *req)
+ struct request *req, int init_ref)
{
if (ublk_need_req_ref(ubq)) {
struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
- kref_init(&data->ref);
+ refcount_set(&data->ref.refcount, init_ref);
}
}
@@ -667,6 +686,15 @@ static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
}
}
+/* for ublk zero copy */
+static void ublk_io_release(void *priv)
+{
+ struct request *rq = priv;
+ struct ublk_queue *ubq = rq->mq_hctx->driver_data;
+
+ ublk_put_req_ref(ubq, rq);
+}
+
static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
{
return ubq->flags & UBLK_F_NEED_GET_DATA;
@@ -1231,7 +1259,22 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
mapped_bytes >> 9;
}
- ublk_init_req_ref(ubq, req);
+ if (ublk_support_auto_zc(ubq) && ublk_rq_has_data(req)) {
+ int ret;
+
+ /* one extra reference is dropped by ublk_io_release */
+ ublk_init_req_ref(ubq, req, 2);
+ ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release,
+ tag, issue_flags);
+ if (ret) {
+ blk_mq_end_request(req, BLK_STS_IOERR);
+ return;
+ }
+ io->flags |= UBLK_IO_FLAG_UNREG_BUF;
+ } else {
+ ublk_init_req_ref(ubq, req, 1);
+ }
+
ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
}
@@ -1593,7 +1636,8 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
}
static void ublk_commit_completion(struct ublk_device *ub,
- const struct ublksrv_io_cmd *ub_cmd)
+ const struct ublksrv_io_cmd *ub_cmd,
+ unsigned int issue_flags)
{
u32 qid = ub_cmd->q_id, tag = ub_cmd->tag;
struct ublk_queue *ubq = ublk_get_queue(ub, qid);
@@ -1604,6 +1648,15 @@ static void ublk_commit_completion(struct ublk_device *ub,
io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV;
io->res = ub_cmd->result;
+ if (io->flags & UBLK_IO_FLAG_UNREG_BUF) {
+ int ret;
+
+ ret = io_buffer_unregister_bvec(io->cmd, tag, issue_flags);
+ if (ret)
+ io->res = ret;
+ io->flags &= ~UBLK_IO_FLAG_UNREG_BUF;
+ }
+
/* find the io request and complete */
req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag);
if (WARN_ON_ONCE(unlikely(!req)))
@@ -1942,14 +1995,6 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
io_uring_cmd_mark_cancelable(cmd, issue_flags);
}
-static void ublk_io_release(void *priv)
-{
- struct request *rq = priv;
- struct ublk_queue *ubq = rq->mq_hctx->driver_data;
-
- ublk_put_req_ref(ubq, rq);
-}
-
static int ublk_register_io_buf(struct io_uring_cmd *cmd,
struct ublk_queue *ubq, unsigned int tag,
unsigned int index, unsigned int issue_flags)
@@ -2124,7 +2169,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
}
ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
- ublk_commit_completion(ub, ub_cmd);
+ ublk_commit_completion(ub, ub_cmd, issue_flags);
break;
case UBLK_IO_NEED_GET_DATA:
if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
@@ -2730,6 +2775,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
return -EINVAL;
}
+ /* F_AUTO_ZERO_COPY and F_SUPPORT_ZERO_COPY can't co-exist */
+ if ((info.flags & UBLK_F_AUTO_ZERO_COPY) &&
+ (info.flags & UBLK_F_SUPPORT_ZERO_COPY))
+ return -EINVAL;
+
/*
* unprivileged device can't be trusted, but RECOVERY and
* RECOVERY_REISSUE still may hang error handling, so can't
@@ -2747,7 +2797,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
* buffer by pwrite() to ublk char device, which can't be
* used for unprivileged device
*/
- if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY))
+ if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
+ UBLK_F_AUTO_ZERO_COPY))
return -EINVAL;
}
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 583b86681c93..d8bb5e55cce7 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -211,6 +211,26 @@
*/
#define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
+/*
+ * request buffer is registered automatically before delivering this io
+ * command to ublk server, meantime it is un-registered automatically
+ * when completing this io command.
+ *
+ * request tag has to be used as the buffer index, and ublk server has to
+ * pass this IO's tag as buffer index for using the registered zero copy
+ * buffer
+ *
+ * This way avoids extra uring_cmd cost, but also simplifies backend
+ * implementation, such as, the dependency on IO_REGISTER_IO_BUF and
+ * IO_UNREGISTER_IO_BUF becomes not necessary.
+ *
+ * For using this feature, ublk server has to register buffer table
+ * in sparse way, and buffer number has to be >= ublk queue depth.
+ *
+ * This feature is preferred to UBLK_F_SUPPORT_ZERO_COPY.
+ */
+#define UBLK_F_AUTO_ZERO_COPY (1ULL << 10)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
--
2.47.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-26 9:41 ` [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY Ming Lei
@ 2025-04-26 22:42 ` Caleb Sander Mateos
2025-04-27 2:06 ` Ming Lei
2025-04-27 2:34 ` Keith Busch
0 siblings, 2 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-04-26 22:42 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> register/unregister uring_cmd for each IO, this way is not only inefficient,
> but also introduce dependency between buffer consumer and buffer register/
> unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
This is a great idea!
>
> Add feature UBLK_F_AUTO_ZERO_COPY for addressing the existed zero copy
> limit:
nit: "existed" -> "existing", "limit" -> "limitation"
>
> - register request buffer automatically before delivering io command to
> ublk server
>
> - unregister request buffer automatically when completing the request
>
> - io_uring will unregister the buffer automatically when uring is
> exiting, so we needn't worry about accident exit
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 87 +++++++++++++++++++++++++++--------
> include/uapi/linux/ublk_cmd.h | 20 ++++++++
> 2 files changed, 89 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 347790b3a633..453767f91431 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -64,7 +64,8 @@
> | UBLK_F_CMD_IOCTL_ENCODE \
> | UBLK_F_USER_COPY \
> | UBLK_F_ZONED \
> - | UBLK_F_USER_RECOVERY_FAIL_IO)
> + | UBLK_F_USER_RECOVERY_FAIL_IO \
> + | UBLK_F_AUTO_ZERO_COPY)
>
> #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> | UBLK_F_USER_RECOVERY_REISSUE \
> @@ -131,6 +132,14 @@ struct ublk_uring_cmd_pdu {
> */
> #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
>
> +/*
> + * request buffer is registered automatically, so we have to unregister it
> + * before completing this request.
> + *
> + * io_uring will unregister buffer automatically for us during exiting.
> + */
> +#define UBLK_IO_FLAG_UNREG_BUF 0x10
> +
> /* atomic RW with ubq->cancel_lock */
> #define UBLK_IO_FLAG_CANCELED 0x80000000
>
> @@ -207,7 +216,8 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
> int tag);
> static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> {
> - return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> + return ub->dev_info.flags & (UBLK_F_USER_COPY |
> + UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY);
> }
>
> static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> @@ -614,6 +624,11 @@ static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> }
>
> +static inline bool ublk_support_auto_zc(const struct ublk_queue *ubq)
> +{
> + return ubq->flags & UBLK_F_AUTO_ZERO_COPY;
> +}
> +
> static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> {
> return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> @@ -621,7 +636,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
>
> static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
> {
> - return !ublk_support_user_copy(ubq);
> + return !ublk_support_user_copy(ubq) && !ublk_support_auto_zc(ubq);
> }
>
> static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> @@ -629,17 +644,21 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> /*
> * read()/write() is involved in user copy, so request reference
> * has to be grabbed
> + *
> + * For auto zc, ublk server still may issue UBLK_IO_COMMIT_AND_FETCH_REQ
> + * before one registered buffer is used up, so reference is
> + * required too.
> */
> - return ublk_support_user_copy(ubq);
> + return ublk_support_user_copy(ubq) || ublk_support_auto_zc(ubq);
Since both places where ublk_support_user_copy() is used are being
adjusted to also check ublk_support_auto_zc(), maybe
ublk_support_user_copy() should just be changed to check
UBLK_F_AUTO_ZERO_COPY too?
> }
>
> static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
> - struct request *req)
> + struct request *req, int init_ref)
> {
> if (ublk_need_req_ref(ubq)) {
> struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
>
> - kref_init(&data->ref);
> + refcount_set(&data->ref.refcount, init_ref);
It might be nicer not to mix kref and raw reference count operations.
Maybe you could add a prep patch that switches from struct kref to
refcount_t?
> }
> }
>
> @@ -667,6 +686,15 @@ static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
> }
> }
>
> +/* for ublk zero copy */
> +static void ublk_io_release(void *priv)
> +{
> + struct request *rq = priv;
> + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> +
> + ublk_put_req_ref(ubq, rq);
> +}
> +
> static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
> {
> return ubq->flags & UBLK_F_NEED_GET_DATA;
> @@ -1231,7 +1259,22 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
> mapped_bytes >> 9;
> }
>
> - ublk_init_req_ref(ubq, req);
> + if (ublk_support_auto_zc(ubq) && ublk_rq_has_data(req)) {
> + int ret;
> +
> + /* one extra reference is dropped by ublk_io_release */
> + ublk_init_req_ref(ubq, req, 2);
> + ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release,
> + tag, issue_flags);
Using the ublk request tag as the registered buffer index may be too
limiting. It would cause collisions if there are multiple ublk devices
or queues handled on a single io_uring. It also requires offsetting
any registered user buffers so their indices come after all the ublk
ones, which may be difficult if ublk devices are added dynamically.
Perhaps the UBLK_IO_FETCH_REQ operation could specify the registered
buffer index to use for the request?
This also requires the io_uring issuing the zero-copy I/Os to be the
same as the one submitting the ublk fetch requests. That would also
make it difficult for us to adopt for our ublk server, which uses
separate io_urings. Not sure if there is a simple way the ublk server
could specify what io_uring to register the ublk zero-copy buffers
with.
> + if (ret) {
> + blk_mq_end_request(req, BLK_STS_IOERR);
> + return;
Does this leak the ublk fetch request? Seems like it should be
completed to the ublk server with an error code.
> + }
> + io->flags |= UBLK_IO_FLAG_UNREG_BUF;
> + } else {
> + ublk_init_req_ref(ubq, req, 1);
> + }
> +
> ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
> }
>
> @@ -1593,7 +1636,8 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
> }
>
> static void ublk_commit_completion(struct ublk_device *ub,
> - const struct ublksrv_io_cmd *ub_cmd)
> + const struct ublksrv_io_cmd *ub_cmd,
> + unsigned int issue_flags)
> {
> u32 qid = ub_cmd->q_id, tag = ub_cmd->tag;
> struct ublk_queue *ubq = ublk_get_queue(ub, qid);
> @@ -1604,6 +1648,15 @@ static void ublk_commit_completion(struct ublk_device *ub,
> io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV;
> io->res = ub_cmd->result;
>
> + if (io->flags & UBLK_IO_FLAG_UNREG_BUF) {
> + int ret;
> +
> + ret = io_buffer_unregister_bvec(io->cmd, tag, issue_flags);
> + if (ret)
> + io->res = ret;
I think it would be confusing to report an error to the application
submitting the I/O if unregistration fails. The only scenario where it
seems possible for this to fail is if userspace issues an
IORING_REGISTER_BUFFERS_UPDATE that overwrites the registered buffer
slot while the ublk request is being handled by the ublk server. I
would either ignore any error from io_buffer_unregister_bvec() or
return it to the ublk server.
> + io->flags &= ~UBLK_IO_FLAG_UNREG_BUF;
> + }
> +
> /* find the io request and complete */
> req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag);
> if (WARN_ON_ONCE(unlikely(!req)))
> @@ -1942,14 +1995,6 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
> io_uring_cmd_mark_cancelable(cmd, issue_flags);
> }
>
> -static void ublk_io_release(void *priv)
> -{
> - struct request *rq = priv;
> - struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> -
> - ublk_put_req_ref(ubq, rq);
> -}
> -
> static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> struct ublk_queue *ubq, unsigned int tag,
> unsigned int index, unsigned int issue_flags)
> @@ -2124,7 +2169,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> }
>
> ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> - ublk_commit_completion(ub, ub_cmd);
> + ublk_commit_completion(ub, ub_cmd, issue_flags);
> break;
> case UBLK_IO_NEED_GET_DATA:
> if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> @@ -2730,6 +2775,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> return -EINVAL;
> }
>
> + /* F_AUTO_ZERO_COPY and F_SUPPORT_ZERO_COPY can't co-exist */
> + if ((info.flags & UBLK_F_AUTO_ZERO_COPY) &&
> + (info.flags & UBLK_F_SUPPORT_ZERO_COPY))
> + return -EINVAL;
> +
> /*
> * unprivileged device can't be trusted, but RECOVERY and
> * RECOVERY_REISSUE still may hang error handling, so can't
> @@ -2747,7 +2797,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> * buffer by pwrite() to ublk char device, which can't be
> * used for unprivileged device
> */
> - if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY))
> + if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
> + UBLK_F_AUTO_ZERO_COPY))
> return -EINVAL;
> }
>
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 583b86681c93..d8bb5e55cce7 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -211,6 +211,26 @@
> */
> #define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
>
> +/*
> + * request buffer is registered automatically before delivering this io
> + * command to ublk server, meantime it is un-registered automatically
> + * when completing this io command.
> + *
> + * request tag has to be used as the buffer index, and ublk server has to
> + * pass this IO's tag as buffer index for using the registered zero copy
> + * buffer
> + *
> + * This way avoids extra uring_cmd cost, but also simplifies backend
> + * implementation, such as, the dependency on IO_REGISTER_IO_BUF and
> + * IO_UNREGISTER_IO_BUF becomes not necessary.
> + *
> + * For using this feature, ublk server has to register buffer table
> + * in sparse way, and buffer number has to be >= ublk queue depth.
> + *
> + * This feature is preferred to UBLK_F_SUPPORT_ZERO_COPY.
> + */
> +#define UBLK_F_AUTO_ZERO_COPY (1ULL << 10)
This flag is already taken by UBLK_F_UPDATE_SIZE in commit "ublk: Add
UBLK_U_CMD_UPDATE_SIZE". You may need to rebase on for-6.16/block.
Best,
Caleb
> +
> /* device state */
> #define UBLK_S_DEV_DEAD 0
> #define UBLK_S_DEV_LIVE 1
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-26 22:42 ` Caleb Sander Mateos
@ 2025-04-27 2:06 ` Ming Lei
2025-04-27 3:09 ` Caleb Sander Mateos
2025-04-27 2:34 ` Keith Busch
1 sibling, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-27 2:06 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 03:42:59PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> > register/unregister uring_cmd for each IO, this way is not only inefficient,
> > but also introduce dependency between buffer consumer and buffer register/
> > unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> > in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
>
> This is a great idea!
>
> >
> > Add feature UBLK_F_AUTO_ZERO_COPY for addressing the existed zero copy
> > limit:
>
> nit: "existed" -> "existing", "limit" -> "limitation"
>
> >
> > - register request buffer automatically before delivering io command to
> > ublk server
> >
> > - unregister request buffer automatically when completing the request
> >
> > - io_uring will unregister the buffer automatically when uring is
> > exiting, so we needn't worry about accident exit
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 87 +++++++++++++++++++++++++++--------
> > include/uapi/linux/ublk_cmd.h | 20 ++++++++
> > 2 files changed, 89 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 347790b3a633..453767f91431 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -64,7 +64,8 @@
> > | UBLK_F_CMD_IOCTL_ENCODE \
> > | UBLK_F_USER_COPY \
> > | UBLK_F_ZONED \
> > - | UBLK_F_USER_RECOVERY_FAIL_IO)
> > + | UBLK_F_USER_RECOVERY_FAIL_IO \
> > + | UBLK_F_AUTO_ZERO_COPY)
> >
> > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> > | UBLK_F_USER_RECOVERY_REISSUE \
> > @@ -131,6 +132,14 @@ struct ublk_uring_cmd_pdu {
> > */
> > #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
> >
> > +/*
> > + * request buffer is registered automatically, so we have to unregister it
> > + * before completing this request.
> > + *
> > + * io_uring will unregister buffer automatically for us during exiting.
> > + */
> > +#define UBLK_IO_FLAG_UNREG_BUF 0x10
> > +
> > /* atomic RW with ubq->cancel_lock */
> > #define UBLK_IO_FLAG_CANCELED 0x80000000
> >
> > @@ -207,7 +216,8 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
> > int tag);
> > static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> > {
> > - return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > + return ub->dev_info.flags & (UBLK_F_USER_COPY |
> > + UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY);
> > }
> >
> > static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> > @@ -614,6 +624,11 @@ static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > }
> >
> > +static inline bool ublk_support_auto_zc(const struct ublk_queue *ubq)
> > +{
> > + return ubq->flags & UBLK_F_AUTO_ZERO_COPY;
> > +}
> > +
> > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > {
> > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > @@ -621,7 +636,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> >
> > static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
> > {
> > - return !ublk_support_user_copy(ubq);
> > + return !ublk_support_user_copy(ubq) && !ublk_support_auto_zc(ubq);
> > }
> >
> > static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > @@ -629,17 +644,21 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > /*
> > * read()/write() is involved in user copy, so request reference
> > * has to be grabbed
> > + *
> > + * For auto zc, ublk server still may issue UBLK_IO_COMMIT_AND_FETCH_REQ
> > + * before one registered buffer is used up, so reference is
> > + * required too.
> > */
> > - return ublk_support_user_copy(ubq);
> > + return ublk_support_user_copy(ubq) || ublk_support_auto_zc(ubq);
>
> Since both places where ublk_support_user_copy() is used are being
> adjusted to also check ublk_support_auto_zc(), maybe
> ublk_support_user_copy() should just be changed to check
> UBLK_F_AUTO_ZERO_COPY too?
I think that isn't good, we should have let each flag to cover its feature only.
That said it was not good to add zero copy check in both ublk_support_user_copy()
and ublk_dev_is_user_copy().
If ublk server needs user copy, it should have enabled it explicitly.
>
> > }
> >
> > static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
> > - struct request *req)
> > + struct request *req, int init_ref)
> > {
> > if (ublk_need_req_ref(ubq)) {
> > struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> >
> > - kref_init(&data->ref);
> > + refcount_set(&data->ref.refcount, init_ref);
>
> It might be nicer not to mix kref and raw reference count operations.
> Maybe you could add a prep patch that switches from struct kref to
> refcount_t?
That is fine, or probably kref need to provide one variant of __kref_init(nr).
>
> > }
> > }
> >
> > @@ -667,6 +686,15 @@ static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
> > }
> > }
> >
> > +/* for ublk zero copy */
> > +static void ublk_io_release(void *priv)
> > +{
> > + struct request *rq = priv;
> > + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > +
> > + ublk_put_req_ref(ubq, rq);
> > +}
> > +
> > static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
> > {
> > return ubq->flags & UBLK_F_NEED_GET_DATA;
> > @@ -1231,7 +1259,22 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
> > mapped_bytes >> 9;
> > }
> >
> > - ublk_init_req_ref(ubq, req);
> > + if (ublk_support_auto_zc(ubq) && ublk_rq_has_data(req)) {
> > + int ret;
> > +
> > + /* one extra reference is dropped by ublk_io_release */
> > + ublk_init_req_ref(ubq, req, 2);
> > + ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release,
> > + tag, issue_flags);
>
> Using the ublk request tag as the registered buffer index may be too
> limiting. It would cause collisions if there are multiple ublk devices
> or queues handled on a single io_uring. It also requires offsetting
> any registered user buffers so their indices come after all the ublk
> ones, which may be difficult if ublk devices are added dynamically.
> Perhaps the UBLK_IO_FETCH_REQ operation could specify the registered
> buffer index to use for the request?
>
> This also requires the io_uring issuing the zero-copy I/Os to be the
> same as the one submitting the ublk fetch requests. That would also
> make it difficult for us to adopt for our ublk server, which uses
> separate io_urings. Not sure if there is a simple way the ublk server
> could specify what io_uring to register the ublk zero-copy buffers
> with.
I think it can be done by passing `io_uring fd` and buffer 'index' via
uring_cmd_header, there is still one u64 (->addr) left for zero copy,
then the buffer's `io_uring fd/fixed_fd` and 'index' can be provided
to io_uring core for registering buffer, this way should be flexible
enough for covering any case.
>
> > + if (ret) {
> > + blk_mq_end_request(req, BLK_STS_IOERR);
> > + return;
>
> Does this leak the ublk fetch request? Seems like it should be
It won't leak ublk uring_cmd which is just for delivering ublk
io command to ublk server.
> completed to the ublk server with an error code.
It could be hard for ublk server to deal with, I'd suggest to
start with one simple implementation first.
It is usually one bug, and user will get notified from client's
failure, then complain and ublk server gets fixed.
>
> > + }
> > + io->flags |= UBLK_IO_FLAG_UNREG_BUF;
> > + } else {
> > + ublk_init_req_ref(ubq, req, 1);
> > + }
> > +
> > ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
> > }
> >
> > @@ -1593,7 +1636,8 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
> > }
> >
> > static void ublk_commit_completion(struct ublk_device *ub,
> > - const struct ublksrv_io_cmd *ub_cmd)
> > + const struct ublksrv_io_cmd *ub_cmd,
> > + unsigned int issue_flags)
> > {
> > u32 qid = ub_cmd->q_id, tag = ub_cmd->tag;
> > struct ublk_queue *ubq = ublk_get_queue(ub, qid);
> > @@ -1604,6 +1648,15 @@ static void ublk_commit_completion(struct ublk_device *ub,
> > io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV;
> > io->res = ub_cmd->result;
> >
> > + if (io->flags & UBLK_IO_FLAG_UNREG_BUF) {
> > + int ret;
> > +
> > + ret = io_buffer_unregister_bvec(io->cmd, tag, issue_flags);
> > + if (ret)
> > + io->res = ret;
>
> I think it would be confusing to report an error to the application
> submitting the I/O if unregistration fails. The only scenario where it
> seems possible for this to fail is if userspace issues an
> IORING_REGISTER_BUFFERS_UPDATE that overwrites the registered buffer
> slot while the ublk request is being handled by the ublk server. I
> would either ignore any error from io_buffer_unregister_bvec() or
> return it to the ublk server.
Fair enough, maybe WARN_ON_ONCE() is enough.
>
> > + io->flags &= ~UBLK_IO_FLAG_UNREG_BUF;
> > + }
> > +
> > /* find the io request and complete */
> > req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag);
> > if (WARN_ON_ONCE(unlikely(!req)))
> > @@ -1942,14 +1995,6 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
> > io_uring_cmd_mark_cancelable(cmd, issue_flags);
> > }
> >
> > -static void ublk_io_release(void *priv)
> > -{
> > - struct request *rq = priv;
> > - struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > -
> > - ublk_put_req_ref(ubq, rq);
> > -}
> > -
> > static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> > struct ublk_queue *ubq, unsigned int tag,
> > unsigned int index, unsigned int issue_flags)
> > @@ -2124,7 +2169,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> > }
> >
> > ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> > - ublk_commit_completion(ub, ub_cmd);
> > + ublk_commit_completion(ub, ub_cmd, issue_flags);
> > break;
> > case UBLK_IO_NEED_GET_DATA:
> > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > @@ -2730,6 +2775,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > return -EINVAL;
> > }
> >
> > + /* F_AUTO_ZERO_COPY and F_SUPPORT_ZERO_COPY can't co-exist */
> > + if ((info.flags & UBLK_F_AUTO_ZERO_COPY) &&
> > + (info.flags & UBLK_F_SUPPORT_ZERO_COPY))
> > + return -EINVAL;
> > +
> > /*
> > * unprivileged device can't be trusted, but RECOVERY and
> > * RECOVERY_REISSUE still may hang error handling, so can't
> > @@ -2747,7 +2797,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > * buffer by pwrite() to ublk char device, which can't be
> > * used for unprivileged device
> > */
> > - if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY))
> > + if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
> > + UBLK_F_AUTO_ZERO_COPY))
> > return -EINVAL;
> > }
> >
> > diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> > index 583b86681c93..d8bb5e55cce7 100644
> > --- a/include/uapi/linux/ublk_cmd.h
> > +++ b/include/uapi/linux/ublk_cmd.h
> > @@ -211,6 +211,26 @@
> > */
> > #define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
> >
> > +/*
> > + * request buffer is registered automatically before delivering this io
> > + * command to ublk server, meantime it is un-registered automatically
> > + * when completing this io command.
> > + *
> > + * request tag has to be used as the buffer index, and ublk server has to
> > + * pass this IO's tag as buffer index for using the registered zero copy
> > + * buffer
> > + *
> > + * This way avoids extra uring_cmd cost, but also simplifies backend
> > + * implementation, such as, the dependency on IO_REGISTER_IO_BUF and
> > + * IO_UNREGISTER_IO_BUF becomes not necessary.
> > + *
> > + * For using this feature, ublk server has to register buffer table
> > + * in sparse way, and buffer number has to be >= ublk queue depth.
> > + *
> > + * This feature is preferred to UBLK_F_SUPPORT_ZERO_COPY.
> > + */
> > +#define UBLK_F_AUTO_ZERO_COPY (1ULL << 10)
>
> This flag is already taken by UBLK_F_UPDATE_SIZE in commit "ublk: Add
> UBLK_U_CMD_UPDATE_SIZE". You may need to rebase on for-6.16/block.
Good catch, will fix it in V2.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-27 2:06 ` Ming Lei
@ 2025-04-27 3:09 ` Caleb Sander Mateos
2025-04-27 3:15 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-04-27 3:09 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 7:07 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Sat, Apr 26, 2025 at 03:42:59PM -0700, Caleb Sander Mateos wrote:
> > On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> > > register/unregister uring_cmd for each IO, this way is not only inefficient,
> > > but also introduce dependency between buffer consumer and buffer register/
> > > unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> > > in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
> >
> > This is a great idea!
> >
> > >
> > > Add feature UBLK_F_AUTO_ZERO_COPY for addressing the existed zero copy
> > > limit:
> >
> > nit: "existed" -> "existing", "limit" -> "limitation"
> >
> > >
> > > - register request buffer automatically before delivering io command to
> > > ublk server
> > >
> > > - unregister request buffer automatically when completing the request
> > >
> > > - io_uring will unregister the buffer automatically when uring is
> > > exiting, so we needn't worry about accident exit
> > >
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > drivers/block/ublk_drv.c | 87 +++++++++++++++++++++++++++--------
> > > include/uapi/linux/ublk_cmd.h | 20 ++++++++
> > > 2 files changed, 89 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 347790b3a633..453767f91431 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -64,7 +64,8 @@
> > > | UBLK_F_CMD_IOCTL_ENCODE \
> > > | UBLK_F_USER_COPY \
> > > | UBLK_F_ZONED \
> > > - | UBLK_F_USER_RECOVERY_FAIL_IO)
> > > + | UBLK_F_USER_RECOVERY_FAIL_IO \
> > > + | UBLK_F_AUTO_ZERO_COPY)
> > >
> > > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> > > | UBLK_F_USER_RECOVERY_REISSUE \
> > > @@ -131,6 +132,14 @@ struct ublk_uring_cmd_pdu {
> > > */
> > > #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
> > >
> > > +/*
> > > + * request buffer is registered automatically, so we have to unregister it
> > > + * before completing this request.
> > > + *
> > > + * io_uring will unregister buffer automatically for us during exiting.
> > > + */
> > > +#define UBLK_IO_FLAG_UNREG_BUF 0x10
> > > +
> > > /* atomic RW with ubq->cancel_lock */
> > > #define UBLK_IO_FLAG_CANCELED 0x80000000
> > >
> > > @@ -207,7 +216,8 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
> > > int tag);
> > > static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> > > {
> > > - return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > + return ub->dev_info.flags & (UBLK_F_USER_COPY |
> > > + UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY);
> > > }
> > >
> > > static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> > > @@ -614,6 +624,11 @@ static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > > return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > > }
> > >
> > > +static inline bool ublk_support_auto_zc(const struct ublk_queue *ubq)
> > > +{
> > > + return ubq->flags & UBLK_F_AUTO_ZERO_COPY;
> > > +}
> > > +
> > > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > > {
> > > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > @@ -621,7 +636,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > >
> > > static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
> > > {
> > > - return !ublk_support_user_copy(ubq);
> > > + return !ublk_support_user_copy(ubq) && !ublk_support_auto_zc(ubq);
> > > }
> > >
> > > static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > > @@ -629,17 +644,21 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > > /*
> > > * read()/write() is involved in user copy, so request reference
> > > * has to be grabbed
> > > + *
> > > + * For auto zc, ublk server still may issue UBLK_IO_COMMIT_AND_FETCH_REQ
> > > + * before one registered buffer is used up, so reference is
> > > + * required too.
> > > */
> > > - return ublk_support_user_copy(ubq);
> > > + return ublk_support_user_copy(ubq) || ublk_support_auto_zc(ubq);
> >
> > Since both places where ublk_support_user_copy() is used are being
> > adjusted to also check ublk_support_auto_zc(), maybe
> > ublk_support_user_copy() should just be changed to check
> > UBLK_F_AUTO_ZERO_COPY too?
>
> I think that isn't good, we should have let each flag to cover its feature only.
>
> That said it was not good to add zero copy check in both ublk_support_user_copy()
> and ublk_dev_is_user_copy().
Fair point.
>
> If ublk server needs user copy, it should have enabled it explicitly.
>
> >
> > > }
> > >
> > > static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
> > > - struct request *req)
> > > + struct request *req, int init_ref)
> > > {
> > > if (ublk_need_req_ref(ubq)) {
> > > struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > >
> > > - kref_init(&data->ref);
> > > + refcount_set(&data->ref.refcount, init_ref);
> >
> > It might be nicer not to mix kref and raw reference count operations.
> > Maybe you could add a prep patch that switches from struct kref to
> > refcount_t?
>
> That is fine, or probably kref need to provide one variant of __kref_init(nr).
>
> >
> > > }
> > > }
> > >
> > > @@ -667,6 +686,15 @@ static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
> > > }
> > > }
> > >
> > > +/* for ublk zero copy */
> > > +static void ublk_io_release(void *priv)
> > > +{
> > > + struct request *rq = priv;
> > > + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > > +
> > > + ublk_put_req_ref(ubq, rq);
> > > +}
> > > +
> > > static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
> > > {
> > > return ubq->flags & UBLK_F_NEED_GET_DATA;
> > > @@ -1231,7 +1259,22 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
> > > mapped_bytes >> 9;
> > > }
> > >
> > > - ublk_init_req_ref(ubq, req);
> > > + if (ublk_support_auto_zc(ubq) && ublk_rq_has_data(req)) {
> > > + int ret;
> > > +
> > > + /* one extra reference is dropped by ublk_io_release */
> > > + ublk_init_req_ref(ubq, req, 2);
> > > + ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release,
> > > + tag, issue_flags);
> >
> > Using the ublk request tag as the registered buffer index may be too
> > limiting. It would cause collisions if there are multiple ublk devices
> > or queues handled on a single io_uring. It also requires offsetting
> > any registered user buffers so their indices come after all the ublk
> > ones, which may be difficult if ublk devices are added dynamically.
> > Perhaps the UBLK_IO_FETCH_REQ operation could specify the registered
> > buffer index to use for the request?
> >
> > This also requires the io_uring issuing the zero-copy I/Os to be the
> > same as the one submitting the ublk fetch requests. That would also
> > make it difficult for us to adopt for our ublk server, which uses
> > separate io_urings. Not sure if there is a simple way the ublk server
> > could specify what io_uring to register the ublk zero-copy buffers
> > with.
>
> I think it can be done by passing `io_uring fd` and buffer 'index' via
> uring_cmd_header, there is still one u64 (->addr) left for zero copy,
> then the buffer's `io_uring fd/fixed_fd` and 'index' can be provided
> to io_uring core for registering buffer, this way should be flexible
> enough for covering any case.
>
> >
> > > + if (ret) {
> > > + blk_mq_end_request(req, BLK_STS_IOERR);
> > > + return;
> >
> > Does this leak the ublk fetch request? Seems like it should be
>
> It won't leak ublk uring_cmd which is just for delivering ublk
> io command to ublk server.
But where does the uring_cmd complete? The early return means this
will skip the call to ubq_complete_io_cmd(). Won't that leave the
uring_cmd stuck until the io_uring exits?
>
> > completed to the ublk server with an error code.
>
> It could be hard for ublk server to deal with, I'd suggest to
> start with one simple implementation first.
>
> It is usually one bug, and user will get notified from client's
> failure, then complain and ublk server gets fixed.
>
> >
> > > + }
> > > + io->flags |= UBLK_IO_FLAG_UNREG_BUF;
> > > + } else {
> > > + ublk_init_req_ref(ubq, req, 1);
> > > + }
> > > +
> > > ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
> > > }
> > >
> > > @@ -1593,7 +1636,8 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
> > > }
> > >
> > > static void ublk_commit_completion(struct ublk_device *ub,
> > > - const struct ublksrv_io_cmd *ub_cmd)
> > > + const struct ublksrv_io_cmd *ub_cmd,
> > > + unsigned int issue_flags)
> > > {
> > > u32 qid = ub_cmd->q_id, tag = ub_cmd->tag;
> > > struct ublk_queue *ubq = ublk_get_queue(ub, qid);
> > > @@ -1604,6 +1648,15 @@ static void ublk_commit_completion(struct ublk_device *ub,
> > > io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV;
> > > io->res = ub_cmd->result;
> > >
> > > + if (io->flags & UBLK_IO_FLAG_UNREG_BUF) {
> > > + int ret;
> > > +
> > > + ret = io_buffer_unregister_bvec(io->cmd, tag, issue_flags);
> > > + if (ret)
> > > + io->res = ret;
> >
> > I think it would be confusing to report an error to the application
> > submitting the I/O if unregistration fails. The only scenario where it
> > seems possible for this to fail is if userspace issues an
> > IORING_REGISTER_BUFFERS_UPDATE that overwrites the registered buffer
> > slot while the ublk request is being handled by the ublk server. I
> > would either ignore any error from io_buffer_unregister_bvec() or
> > return it to the ublk server.
>
> Fair enough, maybe WARN_ON_ONCE() is enough.
I think it's technically triggerable from userspace, but certainly
very unexpected.
Best,
Caleb
>
> >
> > > + io->flags &= ~UBLK_IO_FLAG_UNREG_BUF;
> > > + }
> > > +
> > > /* find the io request and complete */
> > > req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag);
> > > if (WARN_ON_ONCE(unlikely(!req)))
> > > @@ -1942,14 +1995,6 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
> > > io_uring_cmd_mark_cancelable(cmd, issue_flags);
> > > }
> > >
> > > -static void ublk_io_release(void *priv)
> > > -{
> > > - struct request *rq = priv;
> > > - struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > > -
> > > - ublk_put_req_ref(ubq, rq);
> > > -}
> > > -
> > > static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> > > struct ublk_queue *ubq, unsigned int tag,
> > > unsigned int index, unsigned int issue_flags)
> > > @@ -2124,7 +2169,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> > > }
> > >
> > > ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> > > - ublk_commit_completion(ub, ub_cmd);
> > > + ublk_commit_completion(ub, ub_cmd, issue_flags);
> > > break;
> > > case UBLK_IO_NEED_GET_DATA:
> > > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > > @@ -2730,6 +2775,11 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > > return -EINVAL;
> > > }
> > >
> > > + /* F_AUTO_ZERO_COPY and F_SUPPORT_ZERO_COPY can't co-exist */
> > > + if ((info.flags & UBLK_F_AUTO_ZERO_COPY) &&
> > > + (info.flags & UBLK_F_SUPPORT_ZERO_COPY))
> > > + return -EINVAL;
> > > +
> > > /*
> > > * unprivileged device can't be trusted, but RECOVERY and
> > > * RECOVERY_REISSUE still may hang error handling, so can't
> > > @@ -2747,7 +2797,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
> > > * buffer by pwrite() to ublk char device, which can't be
> > > * used for unprivileged device
> > > */
> > > - if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY))
> > > + if (info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
> > > + UBLK_F_AUTO_ZERO_COPY))
> > > return -EINVAL;
> > > }
> > >
> > > diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> > > index 583b86681c93..d8bb5e55cce7 100644
> > > --- a/include/uapi/linux/ublk_cmd.h
> > > +++ b/include/uapi/linux/ublk_cmd.h
> > > @@ -211,6 +211,26 @@
> > > */
> > > #define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
> > >
> > > +/*
> > > + * request buffer is registered automatically before delivering this io
> > > + * command to ublk server, meantime it is un-registered automatically
> > > + * when completing this io command.
> > > + *
> > > + * request tag has to be used as the buffer index, and ublk server has to
> > > + * pass this IO's tag as buffer index for using the registered zero copy
> > > + * buffer
> > > + *
> > > + * This way avoids extra uring_cmd cost, but also simplifies backend
> > > + * implementation, such as, the dependency on IO_REGISTER_IO_BUF and
> > > + * IO_UNREGISTER_IO_BUF becomes not necessary.
> > > + *
> > > + * For using this feature, ublk server has to register buffer table
> > > + * in sparse way, and buffer number has to be >= ublk queue depth.
> > > + *
> > > + * This feature is preferred to UBLK_F_SUPPORT_ZERO_COPY.
> > > + */
> > > +#define UBLK_F_AUTO_ZERO_COPY (1ULL << 10)
> >
> > This flag is already taken by UBLK_F_UPDATE_SIZE in commit "ublk: Add
> > UBLK_U_CMD_UPDATE_SIZE". You may need to rebase on for-6.16/block.
>
> Good catch, will fix it in V2.
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-27 3:09 ` Caleb Sander Mateos
@ 2025-04-27 3:15 ` Ming Lei
0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2025-04-27 3:15 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 08:09:38PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 7:07 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Sat, Apr 26, 2025 at 03:42:59PM -0700, Caleb Sander Mateos wrote:
> > > On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> > > > register/unregister uring_cmd for each IO, this way is not only inefficient,
> > > > but also introduce dependency between buffer consumer and buffer register/
> > > > unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> > > > in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
> > >
> > > This is a great idea!
> > >
> > > >
> > > > Add feature UBLK_F_AUTO_ZERO_COPY for addressing the existed zero copy
> > > > limit:
> > >
> > > nit: "existed" -> "existing", "limit" -> "limitation"
> > >
> > > >
> > > > - register request buffer automatically before delivering io command to
> > > > ublk server
> > > >
> > > > - unregister request buffer automatically when completing the request
> > > >
> > > > - io_uring will unregister the buffer automatically when uring is
> > > > exiting, so we needn't worry about accident exit
> > > >
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > > drivers/block/ublk_drv.c | 87 +++++++++++++++++++++++++++--------
> > > > include/uapi/linux/ublk_cmd.h | 20 ++++++++
> > > > 2 files changed, 89 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index 347790b3a633..453767f91431 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -64,7 +64,8 @@
> > > > | UBLK_F_CMD_IOCTL_ENCODE \
> > > > | UBLK_F_USER_COPY \
> > > > | UBLK_F_ZONED \
> > > > - | UBLK_F_USER_RECOVERY_FAIL_IO)
> > > > + | UBLK_F_USER_RECOVERY_FAIL_IO \
> > > > + | UBLK_F_AUTO_ZERO_COPY)
> > > >
> > > > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
> > > > | UBLK_F_USER_RECOVERY_REISSUE \
> > > > @@ -131,6 +132,14 @@ struct ublk_uring_cmd_pdu {
> > > > */
> > > > #define UBLK_IO_FLAG_NEED_GET_DATA 0x08
> > > >
> > > > +/*
> > > > + * request buffer is registered automatically, so we have to unregister it
> > > > + * before completing this request.
> > > > + *
> > > > + * io_uring will unregister buffer automatically for us during exiting.
> > > > + */
> > > > +#define UBLK_IO_FLAG_UNREG_BUF 0x10
> > > > +
> > > > /* atomic RW with ubq->cancel_lock */
> > > > #define UBLK_IO_FLAG_CANCELED 0x80000000
> > > >
> > > > @@ -207,7 +216,8 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
> > > > int tag);
> > > > static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> > > > {
> > > > - return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > > + return ub->dev_info.flags & (UBLK_F_USER_COPY |
> > > > + UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY);
> > > > }
> > > >
> > > > static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> > > > @@ -614,6 +624,11 @@ static inline bool ublk_support_zero_copy(const struct ublk_queue *ubq)
> > > > return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY;
> > > > }
> > > >
> > > > +static inline bool ublk_support_auto_zc(const struct ublk_queue *ubq)
> > > > +{
> > > > + return ubq->flags & UBLK_F_AUTO_ZERO_COPY;
> > > > +}
> > > > +
> > > > static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > > > {
> > > > return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > > > @@ -621,7 +636,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
> > > >
> > > > static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
> > > > {
> > > > - return !ublk_support_user_copy(ubq);
> > > > + return !ublk_support_user_copy(ubq) && !ublk_support_auto_zc(ubq);
> > > > }
> > > >
> > > > static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > > > @@ -629,17 +644,21 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> > > > /*
> > > > * read()/write() is involved in user copy, so request reference
> > > > * has to be grabbed
> > > > + *
> > > > + * For auto zc, ublk server still may issue UBLK_IO_COMMIT_AND_FETCH_REQ
> > > > + * before one registered buffer is used up, so reference is
> > > > + * required too.
> > > > */
> > > > - return ublk_support_user_copy(ubq);
> > > > + return ublk_support_user_copy(ubq) || ublk_support_auto_zc(ubq);
> > >
> > > Since both places where ublk_support_user_copy() is used are being
> > > adjusted to also check ublk_support_auto_zc(), maybe
> > > ublk_support_user_copy() should just be changed to check
> > > UBLK_F_AUTO_ZERO_COPY too?
> >
> > I think that isn't good, we should have let each flag to cover its feature only.
> >
> > That said it was not good to add zero copy check in both ublk_support_user_copy()
> > and ublk_dev_is_user_copy().
>
> Fair point.
>
> >
> > If ublk server needs user copy, it should have enabled it explicitly.
> >
> > >
> > > > }
> > > >
> > > > static inline void ublk_init_req_ref(const struct ublk_queue *ubq,
> > > > - struct request *req)
> > > > + struct request *req, int init_ref)
> > > > {
> > > > if (ublk_need_req_ref(ubq)) {
> > > > struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > > >
> > > > - kref_init(&data->ref);
> > > > + refcount_set(&data->ref.refcount, init_ref);
> > >
> > > It might be nicer not to mix kref and raw reference count operations.
> > > Maybe you could add a prep patch that switches from struct kref to
> > > refcount_t?
> >
> > That is fine, or probably kref need to provide one variant of __kref_init(nr).
> >
> > >
> > > > }
> > > > }
> > > >
> > > > @@ -667,6 +686,15 @@ static inline void ublk_put_req_ref(const struct ublk_queue *ubq,
> > > > }
> > > > }
> > > >
> > > > +/* for ublk zero copy */
> > > > +static void ublk_io_release(void *priv)
> > > > +{
> > > > + struct request *rq = priv;
> > > > + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > > > +
> > > > + ublk_put_req_ref(ubq, rq);
> > > > +}
> > > > +
> > > > static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
> > > > {
> > > > return ubq->flags & UBLK_F_NEED_GET_DATA;
> > > > @@ -1231,7 +1259,22 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
> > > > mapped_bytes >> 9;
> > > > }
> > > >
> > > > - ublk_init_req_ref(ubq, req);
> > > > + if (ublk_support_auto_zc(ubq) && ublk_rq_has_data(req)) {
> > > > + int ret;
> > > > +
> > > > + /* one extra reference is dropped by ublk_io_release */
> > > > + ublk_init_req_ref(ubq, req, 2);
> > > > + ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release,
> > > > + tag, issue_flags);
> > >
> > > Using the ublk request tag as the registered buffer index may be too
> > > limiting. It would cause collisions if there are multiple ublk devices
> > > or queues handled on a single io_uring. It also requires offsetting
> > > any registered user buffers so their indices come after all the ublk
> > > ones, which may be difficult if ublk devices are added dynamically.
> > > Perhaps the UBLK_IO_FETCH_REQ operation could specify the registered
> > > buffer index to use for the request?
> > >
> > > This also requires the io_uring issuing the zero-copy I/Os to be the
> > > same as the one submitting the ublk fetch requests. That would also
> > > make it difficult for us to adopt for our ublk server, which uses
> > > separate io_urings. Not sure if there is a simple way the ublk server
> > > could specify what io_uring to register the ublk zero-copy buffers
> > > with.
> >
> > I think it can be done by passing `io_uring fd` and buffer 'index' via
> > uring_cmd_header, there is still one u64 (->addr) left for zero copy,
> > then the buffer's `io_uring fd/fixed_fd` and 'index' can be provided
> > to io_uring core for registering buffer, this way should be flexible
> > enough for covering any case.
> >
> > >
> > > > + if (ret) {
> > > > + blk_mq_end_request(req, BLK_STS_IOERR);
> > > > + return;
> > >
> > > Does this leak the ublk fetch request? Seems like it should be
> >
> > It won't leak ublk uring_cmd which is just for delivering ublk
> > io command to ublk server.
>
> But where does the uring_cmd complete? The early return means this
> will skip the call to ubq_complete_io_cmd(). Won't that leave the
> uring_cmd stuck until the io_uring exits?
It is like nothing coming from /dev/ublkbN, we always submitted
one uring_cmd beforehand, when no one knows there will be request
available now or future.
>
> >
> > > completed to the ublk server with an error code.
> >
> > It could be hard for ublk server to deal with, I'd suggest to
> > start with one simple implementation first.
> >
> > It is usually one bug, and user will get notified from client's
> > failure, then complain and ublk server gets fixed.
> >
> > >
> > > > + }
> > > > + io->flags |= UBLK_IO_FLAG_UNREG_BUF;
> > > > + } else {
> > > > + ublk_init_req_ref(ubq, req, 1);
> > > > + }
> > > > +
> > > > ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
> > > > }
> > > >
> > > > @@ -1593,7 +1636,8 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
> > > > }
> > > >
> > > > static void ublk_commit_completion(struct ublk_device *ub,
> > > > - const struct ublksrv_io_cmd *ub_cmd)
> > > > + const struct ublksrv_io_cmd *ub_cmd,
> > > > + unsigned int issue_flags)
> > > > {
> > > > u32 qid = ub_cmd->q_id, tag = ub_cmd->tag;
> > > > struct ublk_queue *ubq = ublk_get_queue(ub, qid);
> > > > @@ -1604,6 +1648,15 @@ static void ublk_commit_completion(struct ublk_device *ub,
> > > > io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV;
> > > > io->res = ub_cmd->result;
> > > >
> > > > + if (io->flags & UBLK_IO_FLAG_UNREG_BUF) {
> > > > + int ret;
> > > > +
> > > > + ret = io_buffer_unregister_bvec(io->cmd, tag, issue_flags);
> > > > + if (ret)
> > > > + io->res = ret;
> > >
> > > I think it would be confusing to report an error to the application
> > > submitting the I/O if unregistration fails. The only scenario where it
> > > seems possible for this to fail is if userspace issues an
> > > IORING_REGISTER_BUFFERS_UPDATE that overwrites the registered buffer
> > > slot while the ublk request is being handled by the ublk server. I
> > > would either ignore any error from io_buffer_unregister_bvec() or
> > > return it to the ublk server.
> >
> > Fair enough, maybe WARN_ON_ONCE() is enough.
>
> I think it's technically triggerable from userspace, but certainly
> very unexpected.
Yeah, I agree.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-26 22:42 ` Caleb Sander Mateos
2025-04-27 2:06 ` Ming Lei
@ 2025-04-27 2:34 ` Keith Busch
2025-04-27 3:10 ` Ming Lei
1 sibling, 1 reply; 19+ messages in thread
From: Keith Busch @ 2025-04-27 2:34 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Ming Lei, Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 03:42:59PM -0700, Caleb Sander Mateos wrote:
> On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> > register/unregister uring_cmd for each IO, this way is not only inefficient,
> > but also introduce dependency between buffer consumer and buffer register/
> > unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> > in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
>
> This is a great idea!
This is very similiar to something I proposed off-list, and the feedback
back then was this won't work because the back-end ring that wants to
use the zero-copy buffer isn't the same as the ublk server ring
recieving notification of a new command; the ublk driver has no idea
which uring to register the bvec with. Also, this is using the request
"tag" as the io_uring buf index, which wouldn't work when the ublk
server ring handles multiple ublk devices due to the tag collisions.
If you're can make those trade-offs, then this is a great simplification
to the whole thing.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-27 2:34 ` Keith Busch
@ 2025-04-27 3:10 ` Ming Lei
2025-04-27 4:04 ` Keith Busch
0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-04-27 3:10 UTC (permalink / raw)
To: Keith Busch; +Cc: Caleb Sander Mateos, Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 08:34:40PM -0600, Keith Busch wrote:
> On Sat, Apr 26, 2025 at 03:42:59PM -0700, Caleb Sander Mateos wrote:
> > On Sat, Apr 26, 2025 at 2:41 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer
> > > register/unregister uring_cmd for each IO, this way is not only inefficient,
> > > but also introduce dependency between buffer consumer and buffer register/
> > > unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c
> > > in which backing file IO has to be issued one by one by IOSQE_IO_LINK.
> >
> > This is a great idea!
>
> This is very similiar to something I proposed off-list, and the feedback
Looks we both think of it, :-)
> back then was this won't work because the back-end ring that wants to
> use the zero-copy buffer isn't the same as the ublk server ring
> recieving notification of a new command; the ublk driver has no idea
> which uring to register the bvec with. Also, this is using the request
> "tag" as the io_uring buf index, which wouldn't work when the ublk
> server ring handles multiple ublk devices due to the tag collisions.
>
> If you're can make those trade-offs, then this is a great simplification
> to the whole thing.
The io_uring fd & buffer index can be provided from 'ublksrv_io_cmd'.
https://lore.kernel.org/linux-block/aA2RNG3-WzuQqEN6@fedora/
If we only support IORING_ENTER_REGISTERED_RING, 32bit is enough for
io_uring fd & buffer index, and there is still 64bits available if not
taking UBLK_F_ZONED into account.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-27 3:10 ` Ming Lei
@ 2025-04-27 4:04 ` Keith Busch
2025-04-27 7:32 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Keith Busch @ 2025-04-27 4:04 UTC (permalink / raw)
To: Ming Lei; +Cc: Caleb Sander Mateos, Jens Axboe, linux-block, Uday Shankar
On Sun, Apr 27, 2025 at 11:10:30AM +0800, Ming Lei wrote:
> On Sat, Apr 26, 2025 at 08:34:40PM -0600, Keith Busch wrote:
> >
> > This is very similiar to something I proposed off-list, and the feedback
>
> Looks we both think of it, :-)
Yeah, for real. I was a bit dismayed when I learned of such use cases.
So much simplicity and elegance went away...
> > back then was this won't work because the back-end ring that wants to
> > use the zero-copy buffer isn't the same as the ublk server ring
> > recieving notification of a new command; the ublk driver has no idea
> > which uring to register the bvec with. Also, this is using the request
> > "tag" as the io_uring buf index, which wouldn't work when the ublk
> > server ring handles multiple ublk devices due to the tag collisions.
> >
> > If you're can make those trade-offs, then this is a great simplification
> > to the whole thing.
>
> The io_uring fd & buffer index can be provided from 'ublksrv_io_cmd'.
>
> https://lore.kernel.org/linux-block/aA2RNG3-WzuQqEN6@fedora/
>
> If we only support IORING_ENTER_REGISTERED_RING, 32bit is enough for
> io_uring fd & buffer index, and there is still 64bits available if not
> taking UBLK_F_ZONED into account.
We still need a registered sparse table for the backend ring. I think
maybe a simple ida from the ublk driver to select an index may let the
daemon register something reasonably small.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY
2025-04-27 4:04 ` Keith Busch
@ 2025-04-27 7:32 ` Ming Lei
0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2025-04-27 7:32 UTC (permalink / raw)
To: Keith Busch; +Cc: Caleb Sander Mateos, Jens Axboe, linux-block, Uday Shankar
On Sat, Apr 26, 2025 at 10:04:46PM -0600, Keith Busch wrote:
> On Sun, Apr 27, 2025 at 11:10:30AM +0800, Ming Lei wrote:
> > On Sat, Apr 26, 2025 at 08:34:40PM -0600, Keith Busch wrote:
> > >
> > > This is very similiar to something I proposed off-list, and the feedback
> >
> > Looks we both think of it, :-)
>
> Yeah, for real. I was a bit dismayed when I learned of such use cases.
> So much simplicity and elegance went away...
That is reality, and probably these use cases may be addressed elegantly too
in future...
>
> > > back then was this won't work because the back-end ring that wants to
> > > use the zero-copy buffer isn't the same as the ublk server ring
> > > recieving notification of a new command; the ublk driver has no idea
> > > which uring to register the bvec with. Also, this is using the request
> > > "tag" as the io_uring buf index, which wouldn't work when the ublk
> > > server ring handles multiple ublk devices due to the tag collisions.
> > >
> > > If you're can make those trade-offs, then this is a great simplification
> > > to the whole thing.
> >
> > The io_uring fd & buffer index can be provided from 'ublksrv_io_cmd'.
> >
> > https://lore.kernel.org/linux-block/aA2RNG3-WzuQqEN6@fedora/
> >
> > If we only support IORING_ENTER_REGISTERED_RING, 32bit is enough for
> > io_uring fd & buffer index, and there is still 64bits available if not
> > taking UBLK_F_ZONED into account.
>
> We still need a registered sparse table for the backend ring. I think
> maybe a simple ida from the ublk driver to select an index may let the
> daemon register something reasonably small.
Yeah, I think it is reasonable to let userspace register the sparse table,
and we can document it in UAPI.
thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4/4] selftests: ublk: support UBLK_F_AUTO_ZERO_COPY
2025-04-26 9:41 [PATCH 0/4] ublk: two fixes and support UBLK_F_AUTO_ZERO_COPY Ming Lei
` (2 preceding siblings ...)
2025-04-26 9:41 ` [PATCH 3/4] ublk: add feature UBLK_F_AUTO_ZERO_COPY Ming Lei
@ 2025-04-26 9:41 ` Ming Lei
3 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2025-04-26 9:41 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Uday Shankar, Caleb Sander Mateos, Ming Lei
Enable UBLK_F_AUTO_ZERO_COPY support for ublk utility by argument
`--auto_zc`, meantime support this feature in null, loop and stripe
target code.
Add function test generic_08 for covering basic UBLK_F_AUTO_ZERO_COPY
feature.
Also cover UBLK_F_AUTO_ZERO_COPY in stress_03, stress_04 and stress_05
test too.
'fio/t/io_uring -p0 /dev/ublkb0' shows that F_AUTO_ZERO_COPY can improve
IOPS by 50% compared with F_SUPPORT_ZERO_COPY in my test VM.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
tools/testing/selftests/ublk/Makefile | 2 +
tools/testing/selftests/ublk/file_backed.c | 9 ++--
tools/testing/selftests/ublk/kublk.c | 13 ++++--
tools/testing/selftests/ublk/kublk.h | 6 +++
tools/testing/selftests/ublk/null.c | 43 ++++++++++++++-----
tools/testing/selftests/ublk/stripe.c | 14 +++---
.../testing/selftests/ublk/test_generic_08.sh | 28 ++++++++++++
.../testing/selftests/ublk/test_stress_03.sh | 6 +++
.../testing/selftests/ublk/test_stress_04.sh | 6 +++
.../testing/selftests/ublk/test_stress_05.sh | 8 ++++
10 files changed, 111 insertions(+), 24 deletions(-)
create mode 100755 tools/testing/selftests/ublk/test_generic_08.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index f34ac0bac696..8475716f407b 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -11,6 +11,8 @@ TEST_PROGS += test_generic_05.sh
TEST_PROGS += test_generic_06.sh
TEST_PROGS += test_generic_07.sh
+TEST_PROGS += test_generic_08.sh
+
TEST_PROGS += test_null_01.sh
TEST_PROGS += test_null_02.sh
TEST_PROGS += test_loop_01.sh
diff --git a/tools/testing/selftests/ublk/file_backed.c b/tools/testing/selftests/ublk/file_backed.c
index 6f34eabfae97..0e1b9f670403 100644
--- a/tools/testing/selftests/ublk/file_backed.c
+++ b/tools/testing/selftests/ublk/file_backed.c
@@ -29,11 +29,12 @@ static int loop_queue_flush_io(struct ublk_queue *q, const struct ublksrv_io_des
static int loop_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_desc *iod, int tag)
{
unsigned ublk_op = ublksrv_get_op(iod);
- int zc = ublk_queue_use_zc(q);
- enum io_uring_op op = ublk_to_uring_op(iod, zc);
+ unsigned zc = ublk_queue_use_zc(q);
+ unsigned auto_zc = ublk_queue_use_auto_zc(q);
+ enum io_uring_op op = ublk_to_uring_op(iod, zc | auto_zc);
struct io_uring_sqe *sqe[3];
- if (!zc) {
+ if (!zc || auto_zc) {
ublk_queue_alloc_sqes(q, sqe, 1);
if (!sqe[0])
return -ENOMEM;
@@ -42,6 +43,8 @@ static int loop_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_de
(void *)iod->addr,
iod->nr_sectors << 9,
iod->start_sector << 9);
+ if (auto_zc)
+ sqe[0]->buf_index = tag;
io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE);
/* bit63 marks us as tgt io */
sqe[0]->user_data = build_user_data(tag, ublk_op, 0, 1);
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index 701b47f98902..fe475a76e669 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -420,9 +420,12 @@ static int ublk_queue_init(struct ublk_queue *q)
q->cmd_inflight = 0;
q->tid = gettid();
- if (dev->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) {
+ if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY)) {
q->state |= UBLKSRV_NO_BUF;
- q->state |= UBLKSRV_ZC;
+ if (dev->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY)
+ q->state |= UBLKSRV_ZC;
+ if (dev->dev_info.flags & UBLK_F_AUTO_ZERO_COPY)
+ q->state |= UBLKSRV_AUTO_ZC;
}
cmd_buf_size = ublk_queue_cmd_buf_sz(q);
@@ -461,7 +464,7 @@ static int ublk_queue_init(struct ublk_queue *q)
goto fail;
}
- if (dev->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) {
+ if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_ZERO_COPY)) {
ret = io_uring_register_buffers_sparse(&q->ring, q->q_depth);
if (ret) {
ublk_err("ublk dev %d queue %d register spare buffers failed %d",
@@ -1203,6 +1206,7 @@ static int cmd_dev_get_features(void)
[const_ilog2(UBLK_F_USER_COPY)] = "USER_COPY",
[const_ilog2(UBLK_F_ZONED)] = "ZONED",
[const_ilog2(UBLK_F_USER_RECOVERY_FAIL_IO)] = "RECOVERY_FAIL_IO",
+ [const_ilog2(UBLK_F_AUTO_ZERO_COPY)] = "AUTO_ZC",
};
struct ublk_dev *dev;
__u64 features = 0;
@@ -1297,6 +1301,7 @@ int main(int argc, char *argv[])
{ "recovery_fail_io", 1, NULL, 'e'},
{ "recovery_reissue", 1, NULL, 'i'},
{ "get_data", 1, NULL, 'g'},
+ { "auto_zc", 0, NULL, 0},
{ 0, 0, 0, 0 }
};
const struct ublk_tgt_ops *ops = NULL;
@@ -1367,6 +1372,8 @@ int main(int argc, char *argv[])
ublk_dbg_mask = 0;
if (!strcmp(longopts[option_idx].name, "foreground"))
ctx.fg = 1;
+ if (!strcmp(longopts[option_idx].name, "auto_zc"))
+ ctx.flags |= UBLK_F_AUTO_ZERO_COPY;
break;
case '?':
/*
diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
index 44ee1e4ac55b..4c97ee6aafa2 100644
--- a/tools/testing/selftests/ublk/kublk.h
+++ b/tools/testing/selftests/ublk/kublk.h
@@ -169,6 +169,7 @@ struct ublk_queue {
#define UBLKSRV_QUEUE_IDLE (1U << 1)
#define UBLKSRV_NO_BUF (1U << 2)
#define UBLKSRV_ZC (1U << 3)
+#define UBLKSRV_AUTO_ZC (1U << 4)
unsigned state;
pid_t tid;
pthread_t thread;
@@ -388,6 +389,11 @@ static inline int ublk_queue_use_zc(const struct ublk_queue *q)
return q->state & UBLKSRV_ZC;
}
+static inline int ublk_queue_use_auto_zc(const struct ublk_queue *q)
+{
+ return q->state & UBLKSRV_AUTO_ZC;
+}
+
extern const struct ublk_tgt_ops null_tgt_ops;
extern const struct ublk_tgt_ops loop_tgt_ops;
extern const struct ublk_tgt_ops stripe_tgt_ops;
diff --git a/tools/testing/selftests/ublk/null.c b/tools/testing/selftests/ublk/null.c
index 91fec3690d4b..1362dd422c6e 100644
--- a/tools/testing/selftests/ublk/null.c
+++ b/tools/testing/selftests/ublk/null.c
@@ -42,10 +42,22 @@ static int ublk_null_tgt_init(const struct dev_ctx *ctx, struct ublk_dev *dev)
return 0;
}
+static void __setup_nop_io(int tag, const struct ublksrv_io_desc *iod,
+ struct io_uring_sqe *sqe)
+{
+ unsigned ublk_op = ublksrv_get_op(iod);
+
+ io_uring_prep_nop(sqe);
+ sqe->buf_index = tag;
+ sqe->flags |= IOSQE_FIXED_FILE;
+ sqe->rw_flags = IORING_NOP_FIXED_BUFFER | IORING_NOP_INJECT_RESULT;
+ sqe->len = iod->nr_sectors << 9; /* injected result */
+ sqe->user_data = build_user_data(tag, ublk_op, 0, 1);
+}
+
static int null_queue_zc_io(struct ublk_queue *q, int tag)
{
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
- unsigned ublk_op = ublksrv_get_op(iod);
struct io_uring_sqe *sqe[3];
ublk_queue_alloc_sqes(q, sqe, 3);
@@ -55,12 +67,8 @@ static int null_queue_zc_io(struct ublk_queue *q, int tag)
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, 1);
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
- io_uring_prep_nop(sqe[1]);
- sqe[1]->buf_index = tag;
- sqe[1]->flags |= IOSQE_FIXED_FILE | IOSQE_IO_HARDLINK;
- sqe[1]->rw_flags = IORING_NOP_FIXED_BUFFER | IORING_NOP_INJECT_RESULT;
- sqe[1]->len = iod->nr_sectors << 9; /* injected result */
- sqe[1]->user_data = build_user_data(tag, ublk_op, 0, 1);
+ __setup_nop_io(tag, iod, sqe[1]);
+ sqe[1]->flags |= IOSQE_IO_HARDLINK;
io_uring_prep_buf_unregister(sqe[2], 0, tag, q->q_id, tag);
sqe[2]->user_data = build_user_data(tag, ublk_cmd_op_nr(sqe[2]->cmd_op), 0, 1);
@@ -69,6 +77,16 @@ static int null_queue_zc_io(struct ublk_queue *q, int tag)
return 2;
}
+static int null_queue_auto_zc_io(struct ublk_queue *q, int tag)
+{
+ const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+ struct io_uring_sqe *sqe[1];
+
+ ublk_queue_alloc_sqes(q, sqe, 1);
+ __setup_nop_io(tag, iod, sqe[0]);
+ return 1;
+}
+
static void ublk_null_io_done(struct ublk_queue *q, int tag,
const struct io_uring_cqe *cqe)
{
@@ -94,15 +112,18 @@ static void ublk_null_io_done(struct ublk_queue *q, int tag,
static int ublk_null_queue_io(struct ublk_queue *q, int tag)
{
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
- int zc = ublk_queue_use_zc(q);
+ unsigned auto_zc = ublk_queue_use_auto_zc(q);
+ unsigned zc = ublk_queue_use_zc(q);
int queued;
- if (!zc) {
+ if (auto_zc)
+ queued = null_queue_auto_zc_io(q, tag);
+ else if (zc)
+ queued = null_queue_zc_io(q, tag);
+ else {
ublk_complete_io(q, tag, iod->nr_sectors << 9);
return 0;
}
-
- queued = null_queue_zc_io(q, tag);
ublk_queued_tgt_io(q, tag, queued);
return 0;
}
diff --git a/tools/testing/selftests/ublk/stripe.c b/tools/testing/selftests/ublk/stripe.c
index 5dbd6392d83d..ba3d7789ac21 100644
--- a/tools/testing/selftests/ublk/stripe.c
+++ b/tools/testing/selftests/ublk/stripe.c
@@ -126,8 +126,9 @@ static inline enum io_uring_op stripe_to_uring_op(
static int stripe_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_desc *iod, int tag)
{
const struct stripe_conf *conf = get_chunk_shift(q);
- int zc = !!(ublk_queue_use_zc(q) != 0);
- enum io_uring_op op = stripe_to_uring_op(iod, zc);
+ unsigned auto_zc = (ublk_queue_use_auto_zc(q) != 0);
+ unsigned zc = (ublk_queue_use_zc(q) != 0);
+ enum io_uring_op op = stripe_to_uring_op(iod, zc | auto_zc);
struct io_uring_sqe *sqe[NR_STRIPE];
struct stripe_array *s = alloc_stripe_array(conf, iod);
struct ublk_io *io = ublk_get_io(q, tag);
@@ -153,12 +154,11 @@ static int stripe_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_
(void *)t->vec,
t->nr_vec,
t->start << 9);
- if (zc) {
+ io_uring_sqe_set_flags(sqe[i], IOSQE_FIXED_FILE);
+ if (auto_zc || zc) {
sqe[i]->buf_index = tag;
- io_uring_sqe_set_flags(sqe[i],
- IOSQE_FIXED_FILE | IOSQE_IO_HARDLINK);
- } else {
- io_uring_sqe_set_flags(sqe[i], IOSQE_FIXED_FILE);
+ if (zc)
+ sqe[i]->flags |= IOSQE_IO_HARDLINK;
}
/* bit63 marks us as tgt io */
sqe[i]->user_data = build_user_data(tag, ublksrv_get_op(iod), i - zc, 1);
diff --git a/tools/testing/selftests/ublk/test_generic_08.sh b/tools/testing/selftests/ublk/test_generic_08.sh
new file mode 100755
index 000000000000..5c54331ad900
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_08.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_08"
+ERR_CODE=0
+
+_prep_test "generic" "test UBLK_F_AUTO_ZERO_COPY"
+
+_create_backfile 0 256M
+_create_backfile 1 256M
+
+dev_id=$(_add_ublk_dev -t loop -q 2 --auto_zc "${UBLK_BACKFILES[0]}")
+_check_add_dev $TID $?
+
+if ! _mkfs_mount_test /dev/ublkb"${dev_id}"; then
+ _cleanup_test "generic"
+ _show_result $TID 255
+fi
+
+dev_id=$(_add_ublk_dev -t stripe --auto_zc "${UBLK_BACKFILES[0]}" "${UBLK_BACKFILES[1]}")
+_check_add_dev $TID $?
+_mkfs_mount_test /dev/ublkb"${dev_id}"
+ERR_CODE=$?
+
+_cleanup_test "generic"
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_stress_03.sh b/tools/testing/selftests/ublk/test_stress_03.sh
index e0854f71d35b..11173f4e9842 100755
--- a/tools/testing/selftests/ublk/test_stress_03.sh
+++ b/tools/testing/selftests/ublk/test_stress_03.sh
@@ -32,6 +32,12 @@ _create_backfile 2 128M
ublk_io_and_remove 8G -t null -q 4 -z &
ublk_io_and_remove 256M -t loop -q 4 -z "${UBLK_BACKFILES[0]}" &
ublk_io_and_remove 256M -t stripe -q 4 -z "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+
+if _have_feature "AUTO_ZC"; then
+ ublk_io_and_remove 8G -t null -q 4 --auto_zc &
+ ublk_io_and_remove 256M -t loop -q 4 --auto_zc "${UBLK_BACKFILES[0]}" &
+ ublk_io_and_remove 256M -t stripe -q 4 --auto_zc "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+fi
wait
_cleanup_test "stress"
diff --git a/tools/testing/selftests/ublk/test_stress_04.sh b/tools/testing/selftests/ublk/test_stress_04.sh
index 1798a98387e8..c219753513d1 100755
--- a/tools/testing/selftests/ublk/test_stress_04.sh
+++ b/tools/testing/selftests/ublk/test_stress_04.sh
@@ -31,6 +31,12 @@ _create_backfile 2 128M
ublk_io_and_kill_daemon 8G -t null -q 4 -z &
ublk_io_and_kill_daemon 256M -t loop -q 4 -z "${UBLK_BACKFILES[0]}" &
ublk_io_and_kill_daemon 256M -t stripe -q 4 -z "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+
+if _have_feature "AUTO_ZC"; then
+ ublk_io_and_kill_daemon 8G -t null -q 4 --auto_zc &
+ ublk_io_and_kill_daemon 256M -t loop -q 4 --auto_zc "${UBLK_BACKFILES[0]}" &
+ ublk_io_and_kill_daemon 256M -t stripe -q 4 --auto_zc "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
+fi
wait
_cleanup_test "stress"
diff --git a/tools/testing/selftests/ublk/test_stress_05.sh b/tools/testing/selftests/ublk/test_stress_05.sh
index a7071b10224d..878db3097f6e 100755
--- a/tools/testing/selftests/ublk/test_stress_05.sh
+++ b/tools/testing/selftests/ublk/test_stress_05.sh
@@ -60,5 +60,13 @@ if _have_feature "ZERO_COPY"; then
done
fi
+if _have_feature "AUTO_ZC"; then
+ for reissue in $(seq 0 1); do
+ ublk_io_and_remove 8G -t null -q 4 -g 1 --auto_zc -r 1 -i "$reissue" &
+ ublk_io_and_remove 256M -t loop -q 4 -g 1 --auto_zc -r 1 -i "$reissue" "${UBLK_BACKFILES[1]}" &
+ wait
+ done
+fi
+
_cleanup_test "stress"
_show_result $TID $ERR_CODE
--
2.47.0
^ permalink raw reply related [flat|nested] 19+ messages in thread