linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] ublk: cleanup & improvement & zc follow-up
@ 2025-03-24 13:48 Ming Lei
  2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

Hi Jens,

The 1st three patches are small cleanup.

The 4th & 5th patches are zc follow-up.

The 6th patch implements ->queue_rqs() and improves IOPS by > 10%.

The last two patches are self-test for ->queue_rqs() & segment parameter
change.

Each one is straight-forward.

Ming Lei (8):
  ublk: remove two unused fields from 'struct ublk_queue'
  ublk: add helper of ublk_need_map_io()
  ublk: truncate io command result
  ublk: add segment parameter
  ublk: document zero copy feature
  ublk: implement ->queue_rqs()
  selftests: ublk: add more tests for covering MQ
  selftests: ublk: add test for checking zero copy related parameter

 Documentation/block/ublk.rst                  |  28 +++--
 drivers/block/ublk_drv.c                      | 119 +++++++++++++++---
 include/uapi/linux/ublk_cmd.h                 |   9 ++
 tools/testing/selftests/ublk/Makefile         |   4 +
 tools/testing/selftests/ublk/null.c           |  11 +-
 tools/testing/selftests/ublk/test_common.sh   |   6 +
 .../testing/selftests/ublk/test_generic_02.sh |  44 +++++++
 .../testing/selftests/ublk/test_generic_03.sh |  28 +++++
 tools/testing/selftests/ublk/test_loop_01.sh  |  14 +--
 tools/testing/selftests/ublk/test_loop_03.sh  |  14 +--
 tools/testing/selftests/ublk/test_loop_05.sh  |  28 +++++
 .../testing/selftests/ublk/test_stripe_01.sh  |  14 +--
 .../testing/selftests/ublk/test_stripe_03.sh  |  30 +++++
 13 files changed, 298 insertions(+), 51 deletions(-)
 create mode 100755 tools/testing/selftests/ublk/test_generic_02.sh
 create mode 100755 tools/testing/selftests/ublk/test_generic_03.sh
 create mode 100755 tools/testing/selftests/ublk/test_loop_05.sh
 create mode 100755 tools/testing/selftests/ublk/test_stripe_03.sh

-- 
2.47.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue'
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
@ 2025-03-24 13:48 ` Ming Lei
  2025-03-24 14:32   ` Caleb Sander Mateos
  2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

From: Ming Lei <tom.leiming@gmail.com>

Remove two unused fields(`io_addr` & `max_io_sz`) from `struct ublk_queue`.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c060da409ed8..2ea1ee209b64 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -143,8 +143,6 @@ struct ublk_queue {
 	struct task_struct	*ubq_daemon;
 	char *io_cmd_buf;
 
-	unsigned long io_addr;	/* mapped vm address */
-	unsigned int max_io_sz;
 	bool force_abort;
 	bool timeout;
 	bool canceling;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/8] ublk: add helper of ublk_need_map_io()
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
  2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
@ 2025-03-24 13:48 ` Ming Lei
  2025-03-24 15:01   ` Caleb Sander Mateos
  2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei,
	Ming Lei

From: Ming Lei <tom.leiming@gmail.com>

ublk_need_map_io() is more readable.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2ea1ee209b64..6fa1384c6436 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -596,6 +596,11 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
 	return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
 }
 
+static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
+{
+	return !ublk_support_user_copy(ubq);
+}
+
 static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
 {
 	/*
@@ -923,7 +928,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req,
 {
 	const unsigned int rq_bytes = blk_rq_bytes(req);
 
-	if (ublk_support_user_copy(ubq))
+	if (!ublk_need_map_io(ubq))
 		return rq_bytes;
 
 	/*
@@ -947,7 +952,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq,
 {
 	const unsigned int rq_bytes = blk_rq_bytes(req);
 
-	if (ublk_support_user_copy(ubq))
+	if (!ublk_need_map_io(ubq))
 		return rq_bytes;
 
 	if (ublk_need_unmap_req(req)) {
@@ -1843,7 +1848,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 		if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
 			goto out;
 
-		if (!ublk_support_user_copy(ubq)) {
+		if (ublk_need_map_io(ubq)) {
 			/*
 			 * FETCH_RQ has to provide IO buffer if NEED GET
 			 * DATA is not enabled
@@ -1865,7 +1870,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 		if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
 			goto out;
 
-		if (!ublk_support_user_copy(ubq)) {
+		if (ublk_need_map_io(ubq)) {
 			/*
 			 * COMMIT_AND_FETCH_REQ has to provide IO buffer if
 			 * NEED GET DATA is not enabled or it is Read IO.
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/8] ublk: truncate io command result
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
  2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
  2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
@ 2025-03-24 13:48 ` Ming Lei
  2025-03-24 15:51   ` Caleb Sander Mateos
  2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

If io command result is bigger than request bytes, truncate it to request
bytes. This way is more reliable, and avoids potential risk, even though
both blk_update_request() and ublk_copy_user_pages() works fine in this
way.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6fa1384c6436..acb6aed7be75 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1071,6 +1071,10 @@ static inline void __ublk_complete_rq(struct request *req)
 		goto exit;
 	}
 
+	/* truncate result in case it is bigger than request bytes */
+	if (io->res > blk_rq_bytes(req))
+		io->res = blk_rq_bytes(req);
+
 	/*
 	 * FLUSH, DISCARD or WRITE_ZEROES usually won't return bytes returned, so end them
 	 * directly.
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/8] ublk: add segment parameter
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
                   ` (2 preceding siblings ...)
  2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
@ 2025-03-24 13:48 ` Ming Lei
  2025-03-24 22:26   ` Caleb Sander Mateos
  2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.

ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.

Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.

So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c      | 15 ++++++++++++++-
 include/uapi/linux/ublk_cmd.h |  9 +++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index acb6aed7be75..53a463681a41 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -74,7 +74,7 @@
 #define UBLK_PARAM_TYPE_ALL                                \
 	(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
 	 UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
-	 UBLK_PARAM_TYPE_DMA_ALIGN)
+	 UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
 
 struct ublk_rq_data {
 	struct kref ref;
@@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
 			return -EINVAL;
 	}
 
+	if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
+		const struct ublk_param_segment *p = &ub->params.seg;
+
+		if (!is_power_of_2(p->seg_boundary_mask + 1))
+			return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -2350,6 +2357,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 	if (ub->params.types & UBLK_PARAM_TYPE_DMA_ALIGN)
 		lim.dma_alignment = ub->params.dma.alignment;
 
+	if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
+		lim.seg_boundary_mask = ub->params.seg.seg_boundary_mask;
+		lim.max_segment_size = ub->params.seg.max_segment_size;
+		lim.max_segments = ub->params.seg.max_segments;
+	}
+
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 7255b36b5cf6..83c2b94251f0 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -410,6 +410,13 @@ struct ublk_param_dma_align {
 	__u8	pad[4];
 };
 
+struct ublk_param_segment {
+	__u64 	seg_boundary_mask;
+	__u32 	max_segment_size;
+	__u16 	max_segments;
+	__u8	pad[2];
+};
+
 struct ublk_params {
 	/*
 	 * Total length of parameters, userspace has to set 'len' for both
@@ -423,6 +430,7 @@ struct ublk_params {
 #define UBLK_PARAM_TYPE_DEVT            (1 << 2)
 #define UBLK_PARAM_TYPE_ZONED           (1 << 3)
 #define UBLK_PARAM_TYPE_DMA_ALIGN       (1 << 4)
+#define UBLK_PARAM_TYPE_SEGMENT         (1 << 5)
 	__u32	types;			/* types of parameter included */
 
 	struct ublk_param_basic		basic;
@@ -430,6 +438,7 @@ struct ublk_params {
 	struct ublk_param_devt		devt;
 	struct ublk_param_zoned	zoned;
 	struct ublk_param_dma_align	dma;
+	struct ublk_param_segment	seg;
 };
 
 #endif
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/8] ublk: document zero copy feature
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
                   ` (3 preceding siblings ...)
  2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
@ 2025-03-24 13:49 ` Ming Lei
  2025-03-25 15:26   ` Caleb Sander Mateos
  2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:49 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

Add words to explain how zero copy feature works, and why it has to be
trusted for handling IO read command.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 1e0e7358e14a..33efff25b54d 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -309,18 +309,30 @@ with specified IO tag in the command data:
   ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
   the server buffer (pages) read to the IO request pages.
 
-Future development
-==================
-
 Zero copy
 ---------
 
-Zero copy is a generic requirement for nbd, fuse or similar drivers. A
-problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
-can't be remapped any more in kernel with existing mm interfaces. This can
-occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
-big requests (IO size >= 256 KB) may benefit a lot from zero copy.
+ublk zero copy relies on io_uring's fixed kernel buffer, which provides
+two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
+
+ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
+`io_buffer_register_bvec()` for ublk server to register client request
+buffer into io_uring buffer table, then ublk server can submit io_uring
+IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
+calls `io_buffer_unregister_bvec` to unregister the buffer.
+
+ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
+because it is ublk server's responsibility to make sure IO buffer filled
+with data, and ublk server has to handle short read correctly by returning
+correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
+will be exposed to client application.
+
+ublk server needs to align the parameter of `struct ublk_param_dma_align`
+with backend for zero copy to work correctly.
 
+For reaching best IO performance, ublk server should align its segment
+parameter of `struct ublk_param_segment` with backend for avoiding
+unnecessary IO split.
 
 References
 ==========
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/8] ublk: implement ->queue_rqs()
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
                   ` (4 preceding siblings ...)
  2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
@ 2025-03-24 13:49 ` Ming Lei
  2025-03-24 17:07   ` Uday Shankar
  2025-03-26 21:30   ` Caleb Sander Mateos
  2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
  2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei
  7 siblings, 2 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:49 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

Implement ->queue_rqs() for improving perf in case of MQ.

In this way, we just need to call io_uring_cmd_complete_in_task() once for
one batch, then both io_uring and ublk server can get exact batch from
client side.

Follows IOPS improvement:

- tests

	tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]

	fio/t/io_uring -p0 /dev/ublkb0

- results:

	more than 10% IOPS boost observed

Pass all ublk selftests, especially the io dispatch order test.

Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 77 insertions(+), 8 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 53a463681a41..86621fde7fde 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -83,6 +83,7 @@ struct ublk_rq_data {
 struct ublk_uring_cmd_pdu {
 	struct ublk_queue *ubq;
 	u16 tag;
+	struct rq_list list;
 };
 
 /*
@@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
 	io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
 }
 
+static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
+		unsigned int issue_flags)
+{
+	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
+	struct ublk_queue *ubq = pdu->ubq;
+	struct request *rq;
+
+	while ((rq = rq_list_pop(&pdu->list))) {
+		struct ublk_io *io = &ubq->ios[rq->tag];
+
+		ublk_rq_task_work_cb(io->cmd, issue_flags);
+	}
+}
+
+static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
+{
+	struct request *rq = l->head;
+	struct ublk_io *io = &ubq->ios[rq->tag];
+	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
+
+	pdu->ubq = ubq;
+	pdu->list = *l;
+	rq_list_init(l);
+	io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);
+}
+
 static enum blk_eh_timer_return ublk_timeout(struct request *rq)
 {
 	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
@@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
 	return BLK_EH_RESET_TIMER;
 }
 
-static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
-		const struct blk_mq_queue_data *bd)
+static blk_status_t ublk_prep_rq_batch(struct request *rq)
 {
-	struct ublk_queue *ubq = hctx->driver_data;
-	struct request *rq = bd->rq;
+	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
 	blk_status_t res;
 
-	if (unlikely(ubq->fail_io)) {
+	if (unlikely(ubq->fail_io))
 		return BLK_STS_TARGET;
-	}
 
 	/* fill iod to slot in io cmd buffer */
 	res = ublk_setup_iod(ubq, rq);
@@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
 		return BLK_STS_IOERR;
 
+	if (unlikely(ubq->canceling))
+		return BLK_STS_IOERR;
+
+	blk_mq_start_request(rq);
+	return BLK_STS_OK;
+}
+
+static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
+		const struct blk_mq_queue_data *bd)
+{
+	struct ublk_queue *ubq = hctx->driver_data;
+	struct request *rq = bd->rq;
+	blk_status_t res;
+
 	if (unlikely(ubq->canceling)) {
 		__ublk_abort_rq(ubq, rq);
 		return BLK_STS_OK;
 	}
 
-	blk_mq_start_request(bd->rq);
-	ublk_queue_cmd(ubq, rq);
+	res = ublk_prep_rq_batch(rq);
+	if (res != BLK_STS_OK)
+		return res;
 
+	ublk_queue_cmd(ubq, rq);
 	return BLK_STS_OK;
 }
 
+static void ublk_queue_rqs(struct rq_list *rqlist)
+{
+	struct rq_list requeue_list = { };
+	struct rq_list submit_list = { };
+	struct ublk_queue *ubq = NULL;
+	struct request *req;
+
+	while ((req = rq_list_pop(rqlist))) {
+		struct ublk_queue *this_q = req->mq_hctx->driver_data;
+
+		if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
+			ublk_queue_cmd_list(ubq, &submit_list);
+		ubq = this_q;
+
+		if (ublk_prep_rq_batch(req) == BLK_STS_OK)
+			rq_list_add_tail(&submit_list, req);
+		else
+			rq_list_add_tail(&requeue_list, req);
+	}
+
+	if (ubq && !rq_list_empty(&submit_list))
+		ublk_queue_cmd_list(ubq, &submit_list);
+	*rqlist = requeue_list;
+}
+
 static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
 		unsigned int hctx_idx)
 {
@@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
 
 static const struct blk_mq_ops ublk_mq_ops = {
 	.queue_rq       = ublk_queue_rq,
+	.queue_rqs      = ublk_queue_rqs,
 	.init_hctx	= ublk_init_hctx,
 	.timeout	= ublk_timeout,
 };
@@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
 	BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
 			UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
 
+	BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
+			sizeof_field(struct io_uring_cmd, pdu));
+
 	init_waitqueue_head(&ublk_idr_wq);
 
 	ret = misc_register(&ublk_misc);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/8] selftests: ublk: add more tests for covering MQ
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
                   ` (5 preceding siblings ...)
  2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
@ 2025-03-24 13:49 ` Ming Lei
  2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei
  7 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:49 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

Add test test_generic_02.sh for covering IO dispatch order in case of MQ.

Especially we just support ->queue_rqs() which may affect IO dispatch
order.

Add test_loop_05.sh and test_stripe_03.sh for covering MQ.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/testing/selftests/ublk/Makefile         |  3 ++
 tools/testing/selftests/ublk/test_common.sh   |  6 +++
 .../testing/selftests/ublk/test_generic_02.sh | 44 +++++++++++++++++++
 tools/testing/selftests/ublk/test_loop_01.sh  | 14 +++---
 tools/testing/selftests/ublk/test_loop_03.sh  | 14 +++---
 tools/testing/selftests/ublk/test_loop_05.sh  | 28 ++++++++++++
 .../testing/selftests/ublk/test_stripe_01.sh  | 14 +++---
 .../testing/selftests/ublk/test_stripe_03.sh  | 30 +++++++++++++
 8 files changed, 126 insertions(+), 27 deletions(-)
 create mode 100755 tools/testing/selftests/ublk/test_generic_02.sh
 create mode 100755 tools/testing/selftests/ublk/test_loop_05.sh
 create mode 100755 tools/testing/selftests/ublk/test_stripe_03.sh

diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 7817afe29005..7a8c994de244 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -4,6 +4,7 @@ CFLAGS += -O3 -Wl,-no-as-needed -Wall -I $(top_srcdir)
 LDLIBS += -lpthread -lm -luring
 
 TEST_PROGS := test_generic_01.sh
+TEST_PROGS += test_generic_02.sh
 
 TEST_PROGS += test_null_01.sh
 TEST_PROGS += test_null_02.sh
@@ -11,8 +12,10 @@ TEST_PROGS += test_loop_01.sh
 TEST_PROGS += test_loop_02.sh
 TEST_PROGS += test_loop_03.sh
 TEST_PROGS += test_loop_04.sh
+TEST_PROGS += test_loop_05.sh
 TEST_PROGS += test_stripe_01.sh
 TEST_PROGS += test_stripe_02.sh
+TEST_PROGS += test_stripe_03.sh
 
 TEST_PROGS += test_stress_01.sh
 TEST_PROGS += test_stress_02.sh
diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh
index 75f54ac6b1c4..a88b35943227 100755
--- a/tools/testing/selftests/ublk/test_common.sh
+++ b/tools/testing/selftests/ublk/test_common.sh
@@ -23,6 +23,12 @@ _get_disk_dev_t() {
 	echo $(( (major & 0xfff) << 20 | (minor & 0xfffff) ))
 }
 
+_run_fio_verify_io() {
+	fio --name=verify --rw=randwrite --direct=1 --ioengine=libaio \
+		--bs=8k --iodepth=32 --verify=crc32c --do_verify=1 \
+		--verify_state_save=0 "$@" > /dev/null
+}
+
 _create_backfile() {
 	local my_size=$1
 	local my_file
diff --git a/tools/testing/selftests/ublk/test_generic_02.sh b/tools/testing/selftests/ublk/test_generic_02.sh
new file mode 100755
index 000000000000..3e80121e3bf5
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_02.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_02"
+ERR_CODE=0
+
+if ! _have_program bpftrace; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
+_prep_test "null" "sequential io order for MQ"
+
+dev_id=$(_add_ublk_dev -t null -q 2)
+_check_add_dev $TID $?
+
+dev_t=$(_get_disk_dev_t "$dev_id")
+bpftrace trace/seq_io.bt "$dev_t" "W" 1 > "$UBLK_TMP" 2>&1 &
+btrace_pid=$!
+sleep 2
+
+if ! kill -0 "$btrace_pid" > /dev/null 2>&1; then
+	_cleanup_test "null"
+	exit "$UBLK_SKIP_CODE"
+fi
+
+# run fio over this ublk disk
+fio --name=write_seq \
+    --filename=/dev/ublkb"${dev_id}" \
+    --ioengine=libaio --iodepth=16 \
+    --rw=write \
+    --size=512M \
+    --direct=1 \
+    --bs=4k > /dev/null 2>&1
+ERR_CODE=$?
+kill "$btrace_pid"
+wait
+if grep -q "io_out_of_order" "$UBLK_TMP"; then
+	cat "$UBLK_TMP"
+	ERR_CODE=255
+fi
+_cleanup_test "null"
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_loop_01.sh b/tools/testing/selftests/ublk/test_loop_01.sh
index c882d2a08e13..1ef8b6044777 100755
--- a/tools/testing/selftests/ublk/test_loop_01.sh
+++ b/tools/testing/selftests/ublk/test_loop_01.sh
@@ -6,6 +6,10 @@
 TID="loop_01"
 ERR_CODE=0
 
+if ! _have_program fio; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
 _prep_test "loop" "write and verify test"
 
 backfile_0=$(_create_backfile 256M)
@@ -14,15 +18,7 @@ dev_id=$(_add_ublk_dev -t loop "$backfile_0")
 _check_add_dev $TID $? "${backfile_0}"
 
 # run fio over the ublk disk
-fio --name=write_and_verify \
-    --filename=/dev/ublkb"${dev_id}" \
-    --ioengine=libaio --iodepth=16 \
-    --rw=write \
-    --size=256M \
-    --direct=1 \
-    --verify=crc32c \
-    --do_verify=1 \
-    --bs=4k > /dev/null 2>&1
+_run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M
 ERR_CODE=$?
 
 _cleanup_test "loop"
diff --git a/tools/testing/selftests/ublk/test_loop_03.sh b/tools/testing/selftests/ublk/test_loop_03.sh
index 269c96787d7d..e9ca744de8b1 100755
--- a/tools/testing/selftests/ublk/test_loop_03.sh
+++ b/tools/testing/selftests/ublk/test_loop_03.sh
@@ -6,6 +6,10 @@
 TID="loop_03"
 ERR_CODE=0
 
+if ! _have_program fio; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
 _prep_test "loop" "write and verify over zero copy"
 
 backfile_0=$(_create_backfile 256M)
@@ -13,15 +17,7 @@ dev_id=$(_add_ublk_dev -t loop -z "$backfile_0")
 _check_add_dev $TID $? "$backfile_0"
 
 # run fio over the ublk disk
-fio --name=write_and_verify \
-    --filename=/dev/ublkb"${dev_id}" \
-    --ioengine=libaio --iodepth=64 \
-    --rw=write \
-    --size=256M \
-    --direct=1 \
-    --verify=crc32c \
-    --do_verify=1 \
-    --bs=4k > /dev/null 2>&1
+_run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M
 ERR_CODE=$?
 
 _cleanup_test "loop"
diff --git a/tools/testing/selftests/ublk/test_loop_05.sh b/tools/testing/selftests/ublk/test_loop_05.sh
new file mode 100755
index 000000000000..2e6e2e6978fc
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_loop_05.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="loop_05"
+ERR_CODE=0
+
+if ! _have_program fio; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
+_prep_test "loop" "write and verify test"
+
+backfile_0=$(_create_backfile 256M)
+
+dev_id=$(_add_ublk_dev -q 2 -t loop "$backfile_0")
+_check_add_dev $TID $? "${backfile_0}"
+
+# run fio over the ublk disk
+_run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=256M
+ERR_CODE=$?
+
+_cleanup_test "loop"
+
+_remove_backfile "$backfile_0"
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_stripe_01.sh b/tools/testing/selftests/ublk/test_stripe_01.sh
index c01f3dc325ab..7e387ef656ea 100755
--- a/tools/testing/selftests/ublk/test_stripe_01.sh
+++ b/tools/testing/selftests/ublk/test_stripe_01.sh
@@ -6,6 +6,10 @@
 TID="stripe_01"
 ERR_CODE=0
 
+if ! _have_program fio; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
 _prep_test "stripe" "write and verify test"
 
 backfile_0=$(_create_backfile 256M)
@@ -15,15 +19,7 @@ dev_id=$(_add_ublk_dev -t stripe "$backfile_0" "$backfile_1")
 _check_add_dev $TID $? "${backfile_0}"
 
 # run fio over the ublk disk
-fio --name=write_and_verify \
-    --filename=/dev/ublkb"${dev_id}" \
-    --ioengine=libaio --iodepth=32 \
-    --rw=write \
-    --size=512M \
-    --direct=1 \
-    --verify=crc32c \
-    --do_verify=1 \
-    --bs=4k > /dev/null 2>&1
+_run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=512M
 ERR_CODE=$?
 
 _cleanup_test "stripe"
diff --git a/tools/testing/selftests/ublk/test_stripe_03.sh b/tools/testing/selftests/ublk/test_stripe_03.sh
new file mode 100755
index 000000000000..c1b34af36145
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_stripe_03.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="stripe_03"
+ERR_CODE=0
+
+if ! _have_program fio; then
+	exit "$UBLK_SKIP_CODE"
+fi
+
+_prep_test "stripe" "write and verify test"
+
+backfile_0=$(_create_backfile 256M)
+backfile_1=$(_create_backfile 256M)
+
+dev_id=$(_add_ublk_dev -q 2 -t stripe "$backfile_0" "$backfile_1")
+_check_add_dev $TID $? "${backfile_0}"
+
+# run fio over the ublk disk
+_run_fio_verify_io --filename=/dev/ublkb"${dev_id}" --size=512M
+ERR_CODE=$?
+
+_cleanup_test "stripe"
+
+_remove_backfile "$backfile_0"
+_remove_backfile "$backfile_1"
+
+_show_result $TID $ERR_CODE
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter
  2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
                   ` (6 preceding siblings ...)
  2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
@ 2025-03-24 13:49 ` Ming Lei
  7 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-24 13:49 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Keith Busch, Uday Shankar, Ming Lei

ublk zero copy usually requires to set dma and segment parameter correctly,
so hard-code null target's dma & segment parameter in non-default value,
and verify if they are setup correctly by ublk driver.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/testing/selftests/ublk/Makefile         |  1 +
 tools/testing/selftests/ublk/null.c           | 11 +++++++-
 .../testing/selftests/ublk/test_generic_03.sh | 28 +++++++++++++++++++
 3 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/ublk/test_generic_03.sh

diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 7a8c994de244..d98680d64a2f 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -5,6 +5,7 @@ LDLIBS += -lpthread -lm -luring
 
 TEST_PROGS := test_generic_01.sh
 TEST_PROGS += test_generic_02.sh
+TEST_PROGS += test_generic_03.sh
 
 TEST_PROGS += test_null_01.sh
 TEST_PROGS += test_null_02.sh
diff --git a/tools/testing/selftests/ublk/null.c b/tools/testing/selftests/ublk/null.c
index 899875ff50fe..91fec3690d4b 100644
--- a/tools/testing/selftests/ublk/null.c
+++ b/tools/testing/selftests/ublk/null.c
@@ -17,7 +17,8 @@ static int ublk_null_tgt_init(const struct dev_ctx *ctx, struct ublk_dev *dev)
 
 	dev->tgt.dev_size = dev_size;
 	dev->tgt.params = (struct ublk_params) {
-		.types = UBLK_PARAM_TYPE_BASIC,
+		.types = UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DMA_ALIGN |
+			UBLK_PARAM_TYPE_SEGMENT,
 		.basic = {
 			.logical_bs_shift	= 9,
 			.physical_bs_shift	= 12,
@@ -26,6 +27,14 @@ static int ublk_null_tgt_init(const struct dev_ctx *ctx, struct ublk_dev *dev)
 			.max_sectors		= info->max_io_buf_bytes >> 9,
 			.dev_sectors		= dev_size >> 9,
 		},
+		.dma = {
+			.alignment 		= 4095,
+		},
+		.seg = {
+			.seg_boundary_mask 	= 4095,
+			.max_segment_size 	= 32 << 10,
+			.max_segments 		= 32,
+		},
 	};
 
 	if (info->flags & UBLK_F_SUPPORT_ZERO_COPY)
diff --git a/tools/testing/selftests/ublk/test_generic_03.sh b/tools/testing/selftests/ublk/test_generic_03.sh
new file mode 100755
index 000000000000..b551aa76cb0d
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_03.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_03"
+ERR_CODE=0
+
+_prep_test "null" "check dma & segment limits for zero copy"
+
+dev_id=$(_add_ublk_dev -t null -z)
+_check_add_dev $TID $?
+
+sysfs_path=/sys/block/ublkb"${dev_id}"
+dma_align=$(cat "$sysfs_path"/queue/dma_alignment)
+max_segments=$(cat "$sysfs_path"/queue/max_segments)
+max_segment_size=$(cat "$sysfs_path"/queue/max_segment_size)
+if [ "$dma_align" != "4095" ]; then
+	ERR_CODE=255
+fi
+if [ "$max_segments" != "32" ]; then
+	ERR_CODE=255
+fi
+if [ "$max_segment_size" != "32768" ]; then
+	ERR_CODE=255
+fi
+_cleanup_test "null"
+_show_result $TID $ERR_CODE
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue'
  2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
@ 2025-03-24 14:32   ` Caleb Sander Mateos
  0 siblings, 0 replies; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-24 14:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar, Ming Lei

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> From: Ming Lei <tom.leiming@gmail.com>
>
> Remove two unused fields(`io_addr` & `max_io_sz`) from `struct ublk_queue`.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/8] ublk: add helper of ublk_need_map_io()
  2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
@ 2025-03-24 15:01   ` Caleb Sander Mateos
  0 siblings, 0 replies; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-24 15:01 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar, Ming Lei

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> From: Ming Lei <tom.leiming@gmail.com>
>
> ublk_need_map_io() is more readable.

Agreed!

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] ublk: truncate io command result
  2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
@ 2025-03-24 15:51   ` Caleb Sander Mateos
  2025-03-25  0:50     ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-24 15:51 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> If io command result is bigger than request bytes, truncate it to request
> bytes. This way is more reliable, and avoids potential risk, even though
> both blk_update_request() and ublk_copy_user_pages() works fine in this
> way.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 6fa1384c6436..acb6aed7be75 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1071,6 +1071,10 @@ static inline void __ublk_complete_rq(struct request *req)
>                 goto exit;
>         }
>
> +       /* truncate result in case it is bigger than request bytes */
> +       if (io->res > blk_rq_bytes(req))
> +               io->res = blk_rq_bytes(req);

Is this not already handled by the code below that caps io->res?

unmapped_bytes = ublk_unmap_io(ubq, req, io);
// ...
if (unlikely(unmapped_bytes < io->res))
        io->res = unmapped_bytes;

ublk_unmap_io() returns either blk_rq_bytes(req) or the result of
ublk_copy_user_pages(), which should be at most blk_rq_bytes(req)?

Best,
Caleb

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] ublk: implement ->queue_rqs()
  2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
@ 2025-03-24 17:07   ` Uday Shankar
  2025-03-25  1:02     ` Ming Lei
  2025-03-26 21:30   ` Caleb Sander Mateos
  1 sibling, 1 reply; 26+ messages in thread
From: Uday Shankar @ 2025-03-24 17:07 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Caleb Sander Mateos, Keith Busch

On Mon, Mar 24, 2025 at 09:49:01PM +0800, Ming Lei wrote:
> Implement ->queue_rqs() for improving perf in case of MQ.
> 
> In this way, we just need to call io_uring_cmd_complete_in_task() once for
> one batch, then both io_uring and ublk server can get exact batch from
> client side.
> 
> Follows IOPS improvement:
> 
> - tests
> 
> 	tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
> 
> 	fio/t/io_uring -p0 /dev/ublkb0
> 
> - results:
> 
> 	more than 10% IOPS boost observed

Nice!

> 
> Pass all ublk selftests, especially the io dispatch order test.
> 
> Cc: Uday Shankar <ushankar@purestorage.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 77 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 53a463681a41..86621fde7fde 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -83,6 +83,7 @@ struct ublk_rq_data {
>  struct ublk_uring_cmd_pdu {
>  	struct ublk_queue *ubq;
>  	u16 tag;
> +	struct rq_list list;
>  };

I think you'll have a hole after tag here and end up eating the full 32
bytes on x86_64. Maybe put list before tag to leave a bit of space for
some small fields?

Separately I think we should try to consolidate all the per-IO state. It
is currently split across struct io_uring_cmd PDU, struct request and
its PDU, and struct ublk_io. Maybe we can store everything in struct
ublk_io and just put pointers to that where needed.

>  
>  /*
> @@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
>  	io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
>  }
>  
> +static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> +		unsigned int issue_flags)
> +{
> +	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> +	struct ublk_queue *ubq = pdu->ubq;
> +	struct request *rq;
> +
> +	while ((rq = rq_list_pop(&pdu->list))) {
> +		struct ublk_io *io = &ubq->ios[rq->tag];
> +
> +		ublk_rq_task_work_cb(io->cmd, issue_flags);

Maybe rename to ublk_dispatch_one_rq or similar?

> +	}
> +}
> +
> +static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> +{
> +	struct request *rq = l->head;
> +	struct ublk_io *io = &ubq->ios[rq->tag];
> +	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
> +
> +	pdu->ubq = ubq;
> +	pdu->list = *l;
> +	rq_list_init(l);
> +	io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);
> +}
> +
>  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>  {
>  	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> @@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>  	return BLK_EH_RESET_TIMER;
>  }
>  
> -static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> -		const struct blk_mq_queue_data *bd)
> +static blk_status_t ublk_prep_rq_batch(struct request *rq)
>  {
> -	struct ublk_queue *ubq = hctx->driver_data;
> -	struct request *rq = bd->rq;
> +	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
>  	blk_status_t res;
>  
> -	if (unlikely(ubq->fail_io)) {
> +	if (unlikely(ubq->fail_io))
>  		return BLK_STS_TARGET;
> -	}
>  
>  	/* fill iod to slot in io cmd buffer */
>  	res = ublk_setup_iod(ubq, rq);
> @@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
>  		return BLK_STS_IOERR;
>  
> +	if (unlikely(ubq->canceling))
> +		return BLK_STS_IOERR;
> +
> +	blk_mq_start_request(rq);
> +	return BLK_STS_OK;
> +}
> +
> +static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> +		const struct blk_mq_queue_data *bd)
> +{
> +	struct ublk_queue *ubq = hctx->driver_data;
> +	struct request *rq = bd->rq;
> +	blk_status_t res;
> +
>  	if (unlikely(ubq->canceling)) {
>  		__ublk_abort_rq(ubq, rq);
>  		return BLK_STS_OK;
>  	}
>  
> -	blk_mq_start_request(bd->rq);
> -	ublk_queue_cmd(ubq, rq);
> +	res = ublk_prep_rq_batch(rq);
> +	if (res != BLK_STS_OK)
> +		return res;
>  
> +	ublk_queue_cmd(ubq, rq);
>  	return BLK_STS_OK;
>  }
>  
> +static void ublk_queue_rqs(struct rq_list *rqlist)
> +{
> +	struct rq_list requeue_list = { };
> +	struct rq_list submit_list = { };
> +	struct ublk_queue *ubq = NULL;
> +	struct request *req;
> +
> +	while ((req = rq_list_pop(rqlist))) {
> +		struct ublk_queue *this_q = req->mq_hctx->driver_data;
> +
> +		if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
> +			ublk_queue_cmd_list(ubq, &submit_list);
> +		ubq = this_q;
> +
> +		if (ublk_prep_rq_batch(req) == BLK_STS_OK)
> +			rq_list_add_tail(&submit_list, req);
> +		else
> +			rq_list_add_tail(&requeue_list, req);
> +	}
> +
> +	if (ubq && !rq_list_empty(&submit_list))
> +		ublk_queue_cmd_list(ubq, &submit_list);
> +	*rqlist = requeue_list;
> +}
> +
>  static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
>  		unsigned int hctx_idx)
>  {
> @@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
>  
>  static const struct blk_mq_ops ublk_mq_ops = {
>  	.queue_rq       = ublk_queue_rq,
> +	.queue_rqs      = ublk_queue_rqs,
>  	.init_hctx	= ublk_init_hctx,
>  	.timeout	= ublk_timeout,
>  };
> @@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
>  	BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
>  			UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
>  
> +	BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
> +			sizeof_field(struct io_uring_cmd, pdu));
> +

Consider putting this near the struct ublk_uring_cmd_pdu definition or
in ublk_get_uring_cmd_pdu. There are helpers provided by io_uring
(io_uring_cmd_private_sz_check and io_uring_cmd_to_pdu) you can use.

>  	init_waitqueue_head(&ublk_idr_wq);
>  
>  	ret = misc_register(&ublk_misc);
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
@ 2025-03-24 22:26   ` Caleb Sander Mateos
  2025-03-25  1:15     ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-24 22:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> IO split is usually bad in io_uring world, since -EAGAIN is caused and
> IO handling may have to fallback to io-wq, this way does hurt performance.
>
> ublk starts to support zero copy recently, for avoiding unnecessary IO
> split, ublk driver's segment limit should be aligned with backend
> device's segment limit.
>
> Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> which number is aligned with ublk request segment number, so that big
> memory allocation can be avoided by setting reasonable max_segments limit.
>
> So add segment parameter for providing ublk server chance to align
> segment limit with backend, and keep it reasonable from implementation
> viewpoint.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
>  include/uapi/linux/ublk_cmd.h |  9 +++++++++
>  2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index acb6aed7be75..53a463681a41 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -74,7 +74,7 @@
>  #define UBLK_PARAM_TYPE_ALL                                \
>         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
>          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> -        UBLK_PARAM_TYPE_DMA_ALIGN)
> +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
>
>  struct ublk_rq_data {
>         struct kref ref;
> @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
>                         return -EINVAL;
>         }
>
> +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> +               const struct ublk_param_segment *p = &ub->params.seg;
> +
> +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> +                       return -EINVAL;

Looking at blk_validate_limits(), it seems like there are some
additional requirements? Looks like seg_boundary_mask has to be at
least PAGE_SIZE - 1 and max_segment_size has to be at least PAGE_SIZE
if virt_boundary_mask is set?

Aside from that, this looks good to me.

Best,
Caleb

> +       }
> +
>         return 0;
>  }
>
> @@ -2350,6 +2357,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
>         if (ub->params.types & UBLK_PARAM_TYPE_DMA_ALIGN)
>                 lim.dma_alignment = ub->params.dma.alignment;
>
> +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> +               lim.seg_boundary_mask = ub->params.seg.seg_boundary_mask;
> +               lim.max_segment_size = ub->params.seg.max_segment_size;
> +               lim.max_segments = ub->params.seg.max_segments;
> +       }
> +
>         if (wait_for_completion_interruptible(&ub->completion) != 0)
>                 return -EINTR;
>
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 7255b36b5cf6..83c2b94251f0 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -410,6 +410,13 @@ struct ublk_param_dma_align {
>         __u8    pad[4];
>  };
>
> +struct ublk_param_segment {
> +       __u64   seg_boundary_mask;
> +       __u32   max_segment_size;
> +       __u16   max_segments;
> +       __u8    pad[2];
> +};
> +
>  struct ublk_params {
>         /*
>          * Total length of parameters, userspace has to set 'len' for both
> @@ -423,6 +430,7 @@ struct ublk_params {
>  #define UBLK_PARAM_TYPE_DEVT            (1 << 2)
>  #define UBLK_PARAM_TYPE_ZONED           (1 << 3)
>  #define UBLK_PARAM_TYPE_DMA_ALIGN       (1 << 4)
> +#define UBLK_PARAM_TYPE_SEGMENT         (1 << 5)
>         __u32   types;                  /* types of parameter included */
>
>         struct ublk_param_basic         basic;
> @@ -430,6 +438,7 @@ struct ublk_params {
>         struct ublk_param_devt          devt;
>         struct ublk_param_zoned zoned;
>         struct ublk_param_dma_align     dma;
> +       struct ublk_param_segment       seg;
>  };
>
>  #endif
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] ublk: truncate io command result
  2025-03-24 15:51   ` Caleb Sander Mateos
@ 2025-03-25  0:50     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-25  0:50 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 08:51:20AM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > If io command result is bigger than request bytes, truncate it to request
> > bytes. This way is more reliable, and avoids potential risk, even though
> > both blk_update_request() and ublk_copy_user_pages() works fine in this
> > way.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 6fa1384c6436..acb6aed7be75 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -1071,6 +1071,10 @@ static inline void __ublk_complete_rq(struct request *req)
> >                 goto exit;
> >         }
> >
> > +       /* truncate result in case it is bigger than request bytes */
> > +       if (io->res > blk_rq_bytes(req))
> > +               io->res = blk_rq_bytes(req);
> 
> Is this not already handled by the code below that caps io->res?
> 
> unmapped_bytes = ublk_unmap_io(ubq, req, io);
> // ...
> if (unlikely(unmapped_bytes < io->res))
>         io->res = unmapped_bytes;
> 
> ublk_unmap_io() returns either blk_rq_bytes(req) or the result of
> ublk_copy_user_pages(), which should be at most blk_rq_bytes(req)?

Indeed, this patch can be dropped.


thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] ublk: implement ->queue_rqs()
  2025-03-24 17:07   ` Uday Shankar
@ 2025-03-25  1:02     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-25  1:02 UTC (permalink / raw)
  To: Uday Shankar; +Cc: Jens Axboe, linux-block, Caleb Sander Mateos, Keith Busch

On Mon, Mar 24, 2025 at 11:07:21AM -0600, Uday Shankar wrote:
> On Mon, Mar 24, 2025 at 09:49:01PM +0800, Ming Lei wrote:
> > Implement ->queue_rqs() for improving perf in case of MQ.
> > 
> > In this way, we just need to call io_uring_cmd_complete_in_task() once for
> > one batch, then both io_uring and ublk server can get exact batch from
> > client side.
> > 
> > Follows IOPS improvement:
> > 
> > - tests
> > 
> > 	tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
> > 
> > 	fio/t/io_uring -p0 /dev/ublkb0
> > 
> > - results:
> > 
> > 	more than 10% IOPS boost observed
> 
> Nice!
> 
> > 
> > Pass all ublk selftests, especially the io dispatch order test.
> > 
> > Cc: Uday Shankar <ushankar@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 77 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 53a463681a41..86621fde7fde 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -83,6 +83,7 @@ struct ublk_rq_data {
> >  struct ublk_uring_cmd_pdu {
> >  	struct ublk_queue *ubq;
> >  	u16 tag;
> > +	struct rq_list list;
> >  };
> 
> I think you'll have a hole after tag here and end up eating the full 32
> bytes on x86_64. Maybe put list before tag to leave a bit of space for
> some small fields?

Actually here the rq_list member can share space with `tag` or even `ubq`,
will do that in next version.

> 
> Separately I think we should try to consolidate all the per-IO state. It
> is currently split across struct io_uring_cmd PDU, struct request and
> its PDU, and struct ublk_io. Maybe we can store everything in struct
> ublk_io and just put pointers to that where needed.

It is two things between uring_cmd and request:

1) io_uring_cmd & ublk_io has longer lifetime than request, since the command
covers both request delivery and notification, for the latter, there isn't
request yet for this slot.

2) in this case, it has to be put into command payload because io_uring
knows nothing about request, we need to retrieve request or ublk_io from
uring_cmd payload

3) both ublk_io and request payload are pre-allocation, which should be
avoided asap, so I think it may not be good to put everything into ublk_io.

> 
> >  
> >  /*
> > @@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
> >  	io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
> >  }
> >  
> > +static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> > +		unsigned int issue_flags)
> > +{
> > +	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > +	struct ublk_queue *ubq = pdu->ubq;
> > +	struct request *rq;
> > +
> > +	while ((rq = rq_list_pop(&pdu->list))) {
> > +		struct ublk_io *io = &ubq->ios[rq->tag];
> > +
> > +		ublk_rq_task_work_cb(io->cmd, issue_flags);
> 
> Maybe rename to ublk_dispatch_one_rq or similar?

ublk_rq_task_work_cb() is used as uring_cmd callback in case
of ->queue_rq()...

> 
> > +	}
> > +}
> > +
> > +static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> > +{
> > +	struct request *rq = l->head;
> > +	struct ublk_io *io = &ubq->ios[rq->tag];
> > +	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
> > +
> > +	pdu->ubq = ubq;
> > +	pdu->list = *l;
> > +	rq_list_init(l);
> > +	io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);
> > +}
> > +
> >  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  {
> >  	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > @@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  	return BLK_EH_RESET_TIMER;
> >  }
> >  
> > -static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > -		const struct blk_mq_queue_data *bd)
> > +static blk_status_t ublk_prep_rq_batch(struct request *rq)
> >  {
> > -	struct ublk_queue *ubq = hctx->driver_data;
> > -	struct request *rq = bd->rq;
> > +	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> >  	blk_status_t res;
> >  
> > -	if (unlikely(ubq->fail_io)) {
> > +	if (unlikely(ubq->fail_io))
> >  		return BLK_STS_TARGET;
> > -	}
> >  
> >  	/* fill iod to slot in io cmd buffer */
> >  	res = ublk_setup_iod(ubq, rq);
> > @@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  	if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
> >  		return BLK_STS_IOERR;
> >  
> > +	if (unlikely(ubq->canceling))
> > +		return BLK_STS_IOERR;
> > +
> > +	blk_mq_start_request(rq);
> > +	return BLK_STS_OK;
> > +}
> > +
> > +static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > +		const struct blk_mq_queue_data *bd)
> > +{
> > +	struct ublk_queue *ubq = hctx->driver_data;
> > +	struct request *rq = bd->rq;
> > +	blk_status_t res;
> > +
> >  	if (unlikely(ubq->canceling)) {
> >  		__ublk_abort_rq(ubq, rq);
> >  		return BLK_STS_OK;
> >  	}
> >  
> > -	blk_mq_start_request(bd->rq);
> > -	ublk_queue_cmd(ubq, rq);
> > +	res = ublk_prep_rq_batch(rq);
> > +	if (res != BLK_STS_OK)
> > +		return res;
> >  
> > +	ublk_queue_cmd(ubq, rq);
> >  	return BLK_STS_OK;
> >  }
> >  
> > +static void ublk_queue_rqs(struct rq_list *rqlist)
> > +{
> > +	struct rq_list requeue_list = { };
> > +	struct rq_list submit_list = { };
> > +	struct ublk_queue *ubq = NULL;
> > +	struct request *req;
> > +
> > +	while ((req = rq_list_pop(rqlist))) {
> > +		struct ublk_queue *this_q = req->mq_hctx->driver_data;
> > +
> > +		if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
> > +			ublk_queue_cmd_list(ubq, &submit_list);
> > +		ubq = this_q;
> > +
> > +		if (ublk_prep_rq_batch(req) == BLK_STS_OK)
> > +			rq_list_add_tail(&submit_list, req);
> > +		else
> > +			rq_list_add_tail(&requeue_list, req);
> > +	}
> > +
> > +	if (ubq && !rq_list_empty(&submit_list))
> > +		ublk_queue_cmd_list(ubq, &submit_list);
> > +	*rqlist = requeue_list;
> > +}
> > +
> >  static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >  		unsigned int hctx_idx)
> >  {
> > @@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >  
> >  static const struct blk_mq_ops ublk_mq_ops = {
> >  	.queue_rq       = ublk_queue_rq,
> > +	.queue_rqs      = ublk_queue_rqs,
> >  	.init_hctx	= ublk_init_hctx,
> >  	.timeout	= ublk_timeout,
> >  };
> > @@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
> >  	BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
> >  			UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
> >  
> > +	BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
> > +			sizeof_field(struct io_uring_cmd, pdu));
> > +
> 
> Consider putting this near the struct ublk_uring_cmd_pdu definition or
> in ublk_get_uring_cmd_pdu. There are helpers provided by io_uring
> (io_uring_cmd_private_sz_check and io_uring_cmd_to_pdu) you can use.

Yeah, we can do it, but looks the practice is to add build check in
init fn.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-24 22:26   ` Caleb Sander Mateos
@ 2025-03-25  1:15     ` Ming Lei
  2025-03-25 19:43       ` Caleb Sander Mateos
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-25  1:15 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > IO handling may have to fallback to io-wq, this way does hurt performance.
> >
> > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > split, ublk driver's segment limit should be aligned with backend
> > device's segment limit.
> >
> > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > which number is aligned with ublk request segment number, so that big
> > memory allocation can be avoided by setting reasonable max_segments limit.
> >
> > So add segment parameter for providing ublk server chance to align
> > segment limit with backend, and keep it reasonable from implementation
> > viewpoint.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
> >  include/uapi/linux/ublk_cmd.h |  9 +++++++++
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index acb6aed7be75..53a463681a41 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -74,7 +74,7 @@
> >  #define UBLK_PARAM_TYPE_ALL                                \
> >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> >
> >  struct ublk_rq_data {
> >         struct kref ref;
> > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> >                         return -EINVAL;
> >         }
> >
> > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > +               const struct ublk_param_segment *p = &ub->params.seg;
> > +
> > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > +                       return -EINVAL;
> 
> Looking at blk_validate_limits(), it seems like there are some
> additional requirements? Looks like seg_boundary_mask has to be at
> least PAGE_SIZE - 1

Yeah, it isn't done in ublk because block layer runs the check, and it
will be failed when starting the device. That said we take block layer's
default setting, which isn't good from UAPI viewpoint, since block
layer may change the default setting.

Also it is bad to associate device property with PAGE_SIZE which is
a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
for segment limits.

I think we can take 4096 for validation here.

> and max_segment_size has to be at least PAGE_SIZE
> if virt_boundary_mask is set?

If virt_boundary_mask is set, max_segment_size will be ignored usually
except for some stacking devices.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] ublk: document zero copy feature
  2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
@ 2025-03-25 15:26   ` Caleb Sander Mateos
  2025-03-26  2:29     ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-25 15:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Add words to explain how zero copy feature works, and why it has to be
> trusted for handling IO read command.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> index 1e0e7358e14a..33efff25b54d 100644
> --- a/Documentation/block/ublk.rst
> +++ b/Documentation/block/ublk.rst
> @@ -309,18 +309,30 @@ with specified IO tag in the command data:
>    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
>    the server buffer (pages) read to the IO request pages.
>
> -Future development
> -==================
> -
>  Zero copy
>  ---------
>
> -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> -can't be remapped any more in kernel with existing mm interfaces. This can
> -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.

nit: missing parentheses after io_buffer_unregister_bvec

> +
> +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> +`io_buffer_register_bvec()` for ublk server to register client request
> +buffer into io_uring buffer table, then ublk server can submit io_uring
> +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> +calls `io_buffer_unregister_bvec` to unregister the buffer.

Parentheses missing here too.
It might be worth adding a note that the registered buffer and any I/O
that uses it will hold a reference on the ublk request. For a ublk
server implementer, I think it's useful to know that the buffer needs
to be unregistered in order to complete the ublk request, and that the
zero-copy I/O won't corrupt any data even if it completes after the
buffer is unregistered.

> +
> +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> +because it is ublk server's responsibility to make sure IO buffer filled
> +with data, and ublk server has to handle short read correctly by returning
> +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> +will be exposed to client application.

This isn't specific to zero-copy, right? ublk devices configured with
UBLK_F_USER_COPY also need to initialize the full request buffer. I
would also drop the "handle short read" part; ublk servers don't
necessarily issue read operations in the backend, so "short read" may
not be applicable.

Best,
Caleb

> +
> +ublk server needs to align the parameter of `struct ublk_param_dma_align`
> +with backend for zero copy to work correctly.
>
> +For reaching best IO performance, ublk server should align its segment
> +parameter of `struct ublk_param_segment` with backend for avoiding
> +unnecessary IO split.
>
>  References
>  ==========
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-25  1:15     ` Ming Lei
@ 2025-03-25 19:43       ` Caleb Sander Mateos
  2025-03-26  2:17         ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-25 19:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > > IO handling may have to fallback to io-wq, this way does hurt performance.
> > >
> > > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > > split, ublk driver's segment limit should be aligned with backend
> > > device's segment limit.
> > >
> > > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > > which number is aligned with ublk request segment number, so that big
> > > memory allocation can be avoided by setting reasonable max_segments limit.
> > >
> > > So add segment parameter for providing ublk server chance to align
> > > segment limit with backend, and keep it reasonable from implementation
> > > viewpoint.
> > >
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
> > >  include/uapi/linux/ublk_cmd.h |  9 +++++++++
> > >  2 files changed, 23 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index acb6aed7be75..53a463681a41 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -74,7 +74,7 @@
> > >  #define UBLK_PARAM_TYPE_ALL                                \
> > >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> > >
> > >  struct ublk_rq_data {
> > >         struct kref ref;
> > > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > >                         return -EINVAL;
> > >         }
> > >
> > > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > > +               const struct ublk_param_segment *p = &ub->params.seg;
> > > +
> > > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > > +                       return -EINVAL;
> >
> > Looking at blk_validate_limits(), it seems like there are some
> > additional requirements? Looks like seg_boundary_mask has to be at
> > least PAGE_SIZE - 1
>
> Yeah, it isn't done in ublk because block layer runs the check, and it
> will be failed when starting the device. That said we take block layer's
> default setting, which isn't good from UAPI viewpoint, since block
> layer may change the default setting.

Even though blk_validate_limits() rejects it, it appears to log a
warning. That seems undesirable for something controllable from
userspace.
/*
 * By default there is no limit on the segment boundary alignment,
 * but if there is one it can't be smaller than the page size as
 * that would break all the normal I/O patterns.
 */
if (!lim->seg_boundary_mask)
        lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1))
        return -EINVAL;

>
> Also it is bad to associate device property with PAGE_SIZE which is
> a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
> for segment limits.
>
> I think we can take 4096 for validation here.
>
> > and max_segment_size has to be at least PAGE_SIZE
> > if virt_boundary_mask is set?
>
> If virt_boundary_mask is set, max_segment_size will be ignored usually
> except for some stacking devices.

Sorry, I had it backwards. The requirement is if virt_boundary_mask is
*not* set:
/*
 * Stacking device may have both virtual boundary and max segment
 * size limit, so allow this setting now, and long-term the two
 * might need to move out of stacking limits since we have immutable
 * bvec and lower layer bio splitting is supposed to handle the two
 * correctly.
 */
if (lim->virt_boundary_mask) {
        if (!lim->max_segment_size)
                lim->max_segment_size = UINT_MAX;
} else {
        /*
         * The maximum segment size has an odd historic 64k default that
         * drivers probably should override.  Just like the I/O size we
         * require drivers to at least handle a full page per segment.
         */
        if (!lim->max_segment_size)
                lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
        if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE))
                return -EINVAL;
}

Best,
Caleb

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-25 19:43       ` Caleb Sander Mateos
@ 2025-03-26  2:17         ` Ming Lei
  2025-03-26 16:43           ` Caleb Sander Mateos
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-26  2:17 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Tue, Mar 25, 2025 at 12:43:26PM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> > > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > > > IO handling may have to fallback to io-wq, this way does hurt performance.
> > > >
> > > > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > > > split, ublk driver's segment limit should be aligned with backend
> > > > device's segment limit.
> > > >
> > > > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > > > which number is aligned with ublk request segment number, so that big
> > > > memory allocation can be avoided by setting reasonable max_segments limit.
> > > >
> > > > So add segment parameter for providing ublk server chance to align
> > > > segment limit with backend, and keep it reasonable from implementation
> > > > viewpoint.
> > > >
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
> > > >  include/uapi/linux/ublk_cmd.h |  9 +++++++++
> > > >  2 files changed, 23 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index acb6aed7be75..53a463681a41 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -74,7 +74,7 @@
> > > >  #define UBLK_PARAM_TYPE_ALL                                \
> > > >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > > >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > > > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > > > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> > > >
> > > >  struct ublk_rq_data {
> > > >         struct kref ref;
> > > > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > > >                         return -EINVAL;
> > > >         }
> > > >
> > > > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > > > +               const struct ublk_param_segment *p = &ub->params.seg;
> > > > +
> > > > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > > > +                       return -EINVAL;
> > >
> > > Looking at blk_validate_limits(), it seems like there are some
> > > additional requirements? Looks like seg_boundary_mask has to be at
> > > least PAGE_SIZE - 1
> >
> > Yeah, it isn't done in ublk because block layer runs the check, and it
> > will be failed when starting the device. That said we take block layer's
> > default setting, which isn't good from UAPI viewpoint, since block
> > layer may change the default setting.
> 
> Even though blk_validate_limits() rejects it, it appears to log a
> warning. That seems undesirable for something controllable from
> userspace.
> /*
>  * By default there is no limit on the segment boundary alignment,
>  * but if there is one it can't be smaller than the page size as
>  * that would break all the normal I/O patterns.
>  */
> if (!lim->seg_boundary_mask)
>         lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1))
>         return -EINVAL;

Yes, it has been addressed in my local version, and we need to make it
a hw/sw interface.

> 
> >
> > Also it is bad to associate device property with PAGE_SIZE which is
> > a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
> > for segment limits.
> >
> > I think we can take 4096 for validation here.
> >
> > > and max_segment_size has to be at least PAGE_SIZE
> > > if virt_boundary_mask is set?
> >
> > If virt_boundary_mask is set, max_segment_size will be ignored usually
> > except for some stacking devices.
> 
> Sorry, I had it backwards. The requirement is if virt_boundary_mask is
> *not* set:
> /*
>  * Stacking device may have both virtual boundary and max segment
>  * size limit, so allow this setting now, and long-term the two
>  * might need to move out of stacking limits since we have immutable
>  * bvec and lower layer bio splitting is supposed to handle the two
>  * correctly.
>  */
> if (lim->virt_boundary_mask) {
>         if (!lim->max_segment_size)
>                 lim->max_segment_size = UINT_MAX;
> } else {
>         /*
>          * The maximum segment size has an odd historic 64k default that
>          * drivers probably should override.  Just like the I/O size we
>          * require drivers to at least handle a full page per segment.
>          */
>         if (!lim->max_segment_size)
>                 lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
>         if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE))
>                 return -EINVAL;
> }

Right.

Please feel free to see if the revised patch is good:


From 0718b9f130b3bc9b9b06907c687fb5b9eea172f7 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Mon, 24 Mar 2025 12:33:59 +0000
Subject: [PATCH V2 3/8] ublk: add segment parameter

IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.

ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.

Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.

So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c      | 20 +++++++++++++++++++-
 include/uapi/linux/ublk_cmd.h | 21 +++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6fa1384c6436..6367476cef2b 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -74,7 +74,7 @@
 #define UBLK_PARAM_TYPE_ALL                                \
 	(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
 	 UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
-	 UBLK_PARAM_TYPE_DMA_ALIGN)
+	 UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
 
 struct ublk_rq_data {
 	struct kref ref;
@@ -580,6 +580,18 @@ static int ublk_validate_params(const struct ublk_device *ub)
 			return -EINVAL;
 	}
 
+	if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
+		const struct ublk_param_segment *p = &ub->params.seg;
+
+		if (!is_power_of_2(p->seg_boundary_mask + 1))
+			return -EINVAL;
+
+		if (p->seg_boundary_mask + 1 < UBLK_MIN_SEGMENT_SIZE)
+			return -EINVAL;
+		if (p->max_segment_size < UBLK_MIN_SEGMENT_SIZE)
+			return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -2346,6 +2358,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 	if (ub->params.types & UBLK_PARAM_TYPE_DMA_ALIGN)
 		lim.dma_alignment = ub->params.dma.alignment;
 
+	if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
+		lim.seg_boundary_mask = ub->params.seg.seg_boundary_mask;
+		lim.max_segment_size = ub->params.seg.max_segment_size;
+		lim.max_segments = ub->params.seg.max_segments;
+	}
+
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 7255b36b5cf6..ffa805b05141 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -410,6 +410,25 @@ struct ublk_param_dma_align {
 	__u8	pad[4];
 };
 
+#define UBLK_MIN_SEGMENT_SIZE   4096
+struct ublk_param_segment {
+	/*
+	 * seg_boundary_mask + 1 needs to be power_of_2(), and the sum has
+	 * to be >= UBLK_MIN_SEGMENT_SIZE(4096)
+	 */
+	__u64 	seg_boundary_mask;
+
+	/*
+	 * max_segment_size could be override by virt_boundary_mask, so be
+	 * careful when setting both.
+	 *
+	 * max_segment_size has to be >= UBLK_MIN_SEGMENT_SIZE(4096)
+	 */
+	__u32 	max_segment_size;
+	__u16 	max_segments;
+	__u8	pad[2];
+};
+
 struct ublk_params {
 	/*
 	 * Total length of parameters, userspace has to set 'len' for both
@@ -423,6 +442,7 @@ struct ublk_params {
 #define UBLK_PARAM_TYPE_DEVT            (1 << 2)
 #define UBLK_PARAM_TYPE_ZONED           (1 << 3)
 #define UBLK_PARAM_TYPE_DMA_ALIGN       (1 << 4)
+#define UBLK_PARAM_TYPE_SEGMENT         (1 << 5)
 	__u32	types;			/* types of parameter included */
 
 	struct ublk_param_basic		basic;
@@ -430,6 +450,7 @@ struct ublk_params {
 	struct ublk_param_devt		devt;
 	struct ublk_param_zoned	zoned;
 	struct ublk_param_dma_align	dma;
+	struct ublk_param_segment	seg;
 };
 
 #endif
-- 
2.47.0



Thanks,
Ming


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] ublk: document zero copy feature
  2025-03-25 15:26   ` Caleb Sander Mateos
@ 2025-03-26  2:29     ` Ming Lei
  2025-03-26 16:48       ` Caleb Sander Mateos
  0 siblings, 1 reply; 26+ messages in thread
From: Ming Lei @ 2025-03-26  2:29 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Add words to explain how zero copy feature works, and why it has to be
> > trusted for handling IO read command.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > index 1e0e7358e14a..33efff25b54d 100644
> > --- a/Documentation/block/ublk.rst
> > +++ b/Documentation/block/ublk.rst
> > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> >    the server buffer (pages) read to the IO request pages.
> >
> > -Future development
> > -==================
> > -
> >  Zero copy
> >  ---------
> >
> > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > -can't be remapped any more in kernel with existing mm interfaces. This can
> > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> 
> nit: missing parentheses after io_buffer_unregister_bvec

OK

> 
> > +
> > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > +`io_buffer_register_bvec()` for ublk server to register client request
> > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> 
> Parentheses missing here too.
> It might be worth adding a note that the registered buffer and any I/O
> that uses it will hold a reference on the ublk request. For a ublk
> server implementer, I think it's useful to know that the buffer needs
> to be unregistered in order to complete the ublk request, and that the
> zero-copy I/O won't corrupt any data even if it completes after the
> buffer is unregistered.

Good point, how about the following words?

```
The ublk client IO request buffer is guaranteed to be live between calling
`io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.

And any io_uring operation which supports this kind of kernel buffer will
grab one reference of the buffer until the operation is completed.
```

> 
> > +
> > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > +because it is ublk server's responsibility to make sure IO buffer filled
> > +with data, and ublk server has to handle short read correctly by returning
> > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > +will be exposed to client application.
> 
> This isn't specific to zero-copy, right? ublk devices configured with
> UBLK_F_USER_COPY also need to initialize the full request buffer. I

Right, but it is important for zero copy.

> would also drop the "handle short read" part; ublk servers don't
> necessarily issue read operations in the backend, so "short read" may
> not be applicable.

Here 'read' in 'short read' means ublk front READ command, not actual read
done in ublk server. Maybe we can make it more accurate:

```
..., and ublk server has to return correct result to ublk driver when handling
READ command, and the result has to match with how many bytes filled to the IO
buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
application.
```

Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-26  2:17         ` Ming Lei
@ 2025-03-26 16:43           ` Caleb Sander Mateos
  2025-03-27  1:49             ` Ming Lei
  0 siblings, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-26 16:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Tue, Mar 25, 2025 at 7:17 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Mar 25, 2025 at 12:43:26PM -0700, Caleb Sander Mateos wrote:
> > On Mon, Mar 24, 2025 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> > > > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > >
> > > > > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > > > > IO handling may have to fallback to io-wq, this way does hurt performance.
> > > > >
> > > > > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > > > > split, ublk driver's segment limit should be aligned with backend
> > > > > device's segment limit.
> > > > >
> > > > > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > > > > which number is aligned with ublk request segment number, so that big
> > > > > memory allocation can be avoided by setting reasonable max_segments limit.
> > > > >
> > > > > So add segment parameter for providing ublk server chance to align
> > > > > segment limit with backend, and keep it reasonable from implementation
> > > > > viewpoint.
> > > > >
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > ---
> > > > >  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
> > > > >  include/uapi/linux/ublk_cmd.h |  9 +++++++++
> > > > >  2 files changed, 23 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index acb6aed7be75..53a463681a41 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -74,7 +74,7 @@
> > > > >  #define UBLK_PARAM_TYPE_ALL                                \
> > > > >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > > > >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > > > > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > > > > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> > > > >
> > > > >  struct ublk_rq_data {
> > > > >         struct kref ref;
> > > > > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > > > >                         return -EINVAL;
> > > > >         }
> > > > >
> > > > > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > > > > +               const struct ublk_param_segment *p = &ub->params.seg;
> > > > > +
> > > > > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > > > > +                       return -EINVAL;
> > > >
> > > > Looking at blk_validate_limits(), it seems like there are some
> > > > additional requirements? Looks like seg_boundary_mask has to be at
> > > > least PAGE_SIZE - 1
> > >
> > > Yeah, it isn't done in ublk because block layer runs the check, and it
> > > will be failed when starting the device. That said we take block layer's
> > > default setting, which isn't good from UAPI viewpoint, since block
> > > layer may change the default setting.
> >
> > Even though blk_validate_limits() rejects it, it appears to log a
> > warning. That seems undesirable for something controllable from
> > userspace.
> > /*
> >  * By default there is no limit on the segment boundary alignment,
> >  * but if there is one it can't be smaller than the page size as
> >  * that would break all the normal I/O patterns.
> >  */
> > if (!lim->seg_boundary_mask)
> >         lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> > if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1))
> >         return -EINVAL;
>
> Yes, it has been addressed in my local version, and we need to make it
> a hw/sw interface.
>
> >
> > >
> > > Also it is bad to associate device property with PAGE_SIZE which is
> > > a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
> > > for segment limits.
> > >
> > > I think we can take 4096 for validation here.
> > >
> > > > and max_segment_size has to be at least PAGE_SIZE
> > > > if virt_boundary_mask is set?
> > >
> > > If virt_boundary_mask is set, max_segment_size will be ignored usually
> > > except for some stacking devices.
> >
> > Sorry, I had it backwards. The requirement is if virt_boundary_mask is
> > *not* set:
> > /*
> >  * Stacking device may have both virtual boundary and max segment
> >  * size limit, so allow this setting now, and long-term the two
> >  * might need to move out of stacking limits since we have immutable
> >  * bvec and lower layer bio splitting is supposed to handle the two
> >  * correctly.
> >  */
> > if (lim->virt_boundary_mask) {
> >         if (!lim->max_segment_size)
> >                 lim->max_segment_size = UINT_MAX;
> > } else {
> >         /*
> >          * The maximum segment size has an odd historic 64k default that
> >          * drivers probably should override.  Just like the I/O size we
> >          * require drivers to at least handle a full page per segment.
> >          */
> >         if (!lim->max_segment_size)
> >                 lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
> >         if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE))
> >                 return -EINVAL;
> > }
>
> Right.
>
> Please feel free to see if the revised patch is good:
>
>
> From 0718b9f130b3bc9b9b06907c687fb5b9eea172f7 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@redhat.com>
> Date: Mon, 24 Mar 2025 12:33:59 +0000
> Subject: [PATCH V2 3/8] ublk: add segment parameter
>
> IO split is usually bad in io_uring world, since -EAGAIN is caused and
> IO handling may have to fallback to io-wq, this way does hurt performance.
>
> ublk starts to support zero copy recently, for avoiding unnecessary IO
> split, ublk driver's segment limit should be aligned with backend
> device's segment limit.
>
> Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> which number is aligned with ublk request segment number, so that big
> memory allocation can be avoided by setting reasonable max_segments limit.
>
> So add segment parameter for providing ublk server chance to align
> segment limit with backend, and keep it reasonable from implementation
> viewpoint.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c      | 20 +++++++++++++++++++-
>  include/uapi/linux/ublk_cmd.h | 21 +++++++++++++++++++++
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 6fa1384c6436..6367476cef2b 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -74,7 +74,7 @@
>  #define UBLK_PARAM_TYPE_ALL                                \
>         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
>          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> -        UBLK_PARAM_TYPE_DMA_ALIGN)
> +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
>
>  struct ublk_rq_data {
>         struct kref ref;
> @@ -580,6 +580,18 @@ static int ublk_validate_params(const struct ublk_device *ub)
>                         return -EINVAL;
>         }
>
> +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> +               const struct ublk_param_segment *p = &ub->params.seg;
> +
> +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> +                       return -EINVAL;
> +
> +               if (p->seg_boundary_mask + 1 < UBLK_MIN_SEGMENT_SIZE)
> +                       return -EINVAL;
> +               if (p->max_segment_size < UBLK_MIN_SEGMENT_SIZE)
> +                       return -EINVAL;

These checks look good, except they don't allow omitting
seg_boundary_mask or max_segment_size. I can imagine a ublk server
might want to set only some of the segment limits and leave the others
as 0?

Best,
Caleb

> +       }
> +
>         return 0;
>  }
>
> @@ -2346,6 +2358,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
>         if (ub->params.types & UBLK_PARAM_TYPE_DMA_ALIGN)
>                 lim.dma_alignment = ub->params.dma.alignment;
>
> +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> +               lim.seg_boundary_mask = ub->params.seg.seg_boundary_mask;
> +               lim.max_segment_size = ub->params.seg.max_segment_size;
> +               lim.max_segments = ub->params.seg.max_segments;
> +       }
> +
>         if (wait_for_completion_interruptible(&ub->completion) != 0)
>                 return -EINTR;
>
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index 7255b36b5cf6..ffa805b05141 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -410,6 +410,25 @@ struct ublk_param_dma_align {
>         __u8    pad[4];
>  };
>
> +#define UBLK_MIN_SEGMENT_SIZE   4096
> +struct ublk_param_segment {
> +       /*
> +        * seg_boundary_mask + 1 needs to be power_of_2(), and the sum has
> +        * to be >= UBLK_MIN_SEGMENT_SIZE(4096)
> +        */
> +       __u64   seg_boundary_mask;
> +
> +       /*
> +        * max_segment_size could be override by virt_boundary_mask, so be
> +        * careful when setting both.
> +        *
> +        * max_segment_size has to be >= UBLK_MIN_SEGMENT_SIZE(4096)
> +        */
> +       __u32   max_segment_size;
> +       __u16   max_segments;
> +       __u8    pad[2];
> +};
> +
>  struct ublk_params {
>         /*
>          * Total length of parameters, userspace has to set 'len' for both
> @@ -423,6 +442,7 @@ struct ublk_params {
>  #define UBLK_PARAM_TYPE_DEVT            (1 << 2)
>  #define UBLK_PARAM_TYPE_ZONED           (1 << 3)
>  #define UBLK_PARAM_TYPE_DMA_ALIGN       (1 << 4)
> +#define UBLK_PARAM_TYPE_SEGMENT         (1 << 5)
>         __u32   types;                  /* types of parameter included */
>
>         struct ublk_param_basic         basic;
> @@ -430,6 +450,7 @@ struct ublk_params {
>         struct ublk_param_devt          devt;
>         struct ublk_param_zoned zoned;
>         struct ublk_param_dma_align     dma;
> +       struct ublk_param_segment       seg;
>  };
>
>  #endif
> --
> 2.47.0
>
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] ublk: document zero copy feature
  2025-03-26  2:29     ` Ming Lei
@ 2025-03-26 16:48       ` Caleb Sander Mateos
  0 siblings, 0 replies; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-26 16:48 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Tue, Mar 25, 2025 at 7:30 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > Add words to explain how zero copy feature works, and why it has to be
> > > trusted for handling IO read command.
> > >
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> > >  1 file changed, 20 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > > index 1e0e7358e14a..33efff25b54d 100644
> > > --- a/Documentation/block/ublk.rst
> > > +++ b/Documentation/block/ublk.rst
> > > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> > >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> > >    the server buffer (pages) read to the IO request pages.
> > >
> > > -Future development
> > > -==================
> > > -
> > >  Zero copy
> > >  ---------
> > >
> > > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > > -can't be remapped any more in kernel with existing mm interfaces. This can
> > > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> >
> > nit: missing parentheses after io_buffer_unregister_bvec
>
> OK
>
> >
> > > +
> > > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > > +`io_buffer_register_bvec()` for ublk server to register client request
> > > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> >
> > Parentheses missing here too.
> > It might be worth adding a note that the registered buffer and any I/O
> > that uses it will hold a reference on the ublk request. For a ublk
> > server implementer, I think it's useful to know that the buffer needs
> > to be unregistered in order to complete the ublk request, and that the
> > zero-copy I/O won't corrupt any data even if it completes after the
> > buffer is unregistered.
>
> Good point, how about the following words?
>
> ```
> The ublk client IO request buffer is guaranteed to be live between calling
> `io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.
>
> And any io_uring operation which supports this kind of kernel buffer will
> grab one reference of the buffer until the operation is completed.
> ```

That's definitely better. I think it could be a bit more explicit that
the buffer needs to be unregistered in order for the ublk request to
complete.

>
> >
> > > +
> > > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > > +because it is ublk server's responsibility to make sure IO buffer filled
> > > +with data, and ublk server has to handle short read correctly by returning
> > > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > > +will be exposed to client application.
> >
> > This isn't specific to zero-copy, right? ublk devices configured with
> > UBLK_F_USER_COPY also need to initialize the full request buffer. I
>
> Right, but it is important for zero copy.

Sure. Maybe change "zero copy" to "zero copy or user copy"?

>
> > would also drop the "handle short read" part; ublk servers don't
> > necessarily issue read operations in the backend, so "short read" may
> > not be applicable.
>
> Here 'read' in 'short read' means ublk front READ command, not actual read
> done in ublk server. Maybe we can make it more accurate:
>
> ```
> ..., and ublk server has to return correct result to ublk driver when handling
> READ command, and the result has to match with how many bytes filled to the IO
> buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
> application.
> ```

Thanks for clarifying. That looks great!

Best,
Caleb

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] ublk: implement ->queue_rqs()
  2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
  2025-03-24 17:07   ` Uday Shankar
@ 2025-03-26 21:30   ` Caleb Sander Mateos
  2025-03-27  9:15     ` Ming Lei
  1 sibling, 1 reply; 26+ messages in thread
From: Caleb Sander Mateos @ 2025-03-26 21:30 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Implement ->queue_rqs() for improving perf in case of MQ.
>
> In this way, we just need to call io_uring_cmd_complete_in_task() once for
> one batch, then both io_uring and ublk server can get exact batch from
> client side.
>
> Follows IOPS improvement:
>
> - tests
>
>         tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
>
>         fio/t/io_uring -p0 /dev/ublkb0
>
> - results:
>
>         more than 10% IOPS boost observed
>
> Pass all ublk selftests, especially the io dispatch order test.
>
> Cc: Uday Shankar <ushankar@purestorage.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 77 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 53a463681a41..86621fde7fde 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -83,6 +83,7 @@ struct ublk_rq_data {
>  struct ublk_uring_cmd_pdu {
>         struct ublk_queue *ubq;
>         u16 tag;
> +       struct rq_list list;
>  };
>
>  /*
> @@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
>         io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
>  }
>
> +static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> +               unsigned int issue_flags)
> +{
> +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> +       struct ublk_queue *ubq = pdu->ubq;
> +       struct request *rq;
> +
> +       while ((rq = rq_list_pop(&pdu->list))) {
> +               struct ublk_io *io = &ubq->ios[rq->tag];
> +
> +               ublk_rq_task_work_cb(io->cmd, issue_flags);

ublk_rq_task_work_cb() is duplicating the lookup of ubq, rq, and io.
Could you factor out a helper that takes those values instead of cmd?

> +       }
> +}
> +
> +static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> +{
> +       struct request *rq = l->head;
> +       struct ublk_io *io = &ubq->ios[rq->tag];
> +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
> +
> +       pdu->ubq = ubq;

Why does pdu->ubq need to be set here but not in ublk_queue_cmd()? I
would have thought it would already be set to ubq because pdu comes
from a rq belonging to this ubq.

> +       pdu->list = *l;
> +       rq_list_init(l);
> +       io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);

Could store io->cmd in a variable to avoid looking it up twice.

> +}
> +
>  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>  {
>         struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> @@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
>         return BLK_EH_RESET_TIMER;
>  }
>
> -static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> -               const struct blk_mq_queue_data *bd)
> +static blk_status_t ublk_prep_rq_batch(struct request *rq)

naming nit: why "batch"?

>  {
> -       struct ublk_queue *ubq = hctx->driver_data;
> -       struct request *rq = bd->rq;
> +       struct ublk_queue *ubq = rq->mq_hctx->driver_data;
>         blk_status_t res;
>
> -       if (unlikely(ubq->fail_io)) {
> +       if (unlikely(ubq->fail_io))
>                 return BLK_STS_TARGET;
> -       }
>
>         /* fill iod to slot in io cmd buffer */
>         res = ublk_setup_iod(ubq, rq);
> @@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
>         if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
>                 return BLK_STS_IOERR;
>
> +       if (unlikely(ubq->canceling))
> +               return BLK_STS_IOERR;

Why is ubq->cancelling treated differently for ->queue_rq() vs. ->queue_rqs()?

> +
> +       blk_mq_start_request(rq);
> +       return BLK_STS_OK;
> +}
> +
> +static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> +               const struct blk_mq_queue_data *bd)
> +{
> +       struct ublk_queue *ubq = hctx->driver_data;
> +       struct request *rq = bd->rq;
> +       blk_status_t res;
> +
>         if (unlikely(ubq->canceling)) {
>                 __ublk_abort_rq(ubq, rq);
>                 return BLK_STS_OK;
>         }
>
> -       blk_mq_start_request(bd->rq);
> -       ublk_queue_cmd(ubq, rq);
> +       res = ublk_prep_rq_batch(rq);
> +       if (res != BLK_STS_OK)
> +               return res;
>
> +       ublk_queue_cmd(ubq, rq);
>         return BLK_STS_OK;
>  }
>
> +static void ublk_queue_rqs(struct rq_list *rqlist)
> +{
> +       struct rq_list requeue_list = { };
> +       struct rq_list submit_list = { };
> +       struct ublk_queue *ubq = NULL;
> +       struct request *req;
> +
> +       while ((req = rq_list_pop(rqlist))) {
> +               struct ublk_queue *this_q = req->mq_hctx->driver_data;
> +
> +               if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
> +                       ublk_queue_cmd_list(ubq, &submit_list);
> +               ubq = this_q;

Probably could avoid the extra ->driver_data dereference on every rq
by comparing the mq_hctx pointers instead. The ->driver_data
dereference could be moved to the ublk_queue_cmd_list() calls.

> +
> +               if (ublk_prep_rq_batch(req) == BLK_STS_OK)
> +                       rq_list_add_tail(&submit_list, req);
> +               else
> +                       rq_list_add_tail(&requeue_list, req);
> +       }
> +
> +       if (ubq && !rq_list_empty(&submit_list))
> +               ublk_queue_cmd_list(ubq, &submit_list);
> +       *rqlist = requeue_list;
> +}
> +
>  static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
>                 unsigned int hctx_idx)
>  {
> @@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
>
>  static const struct blk_mq_ops ublk_mq_ops = {
>         .queue_rq       = ublk_queue_rq,
> +       .queue_rqs      = ublk_queue_rqs,
>         .init_hctx      = ublk_init_hctx,
>         .timeout        = ublk_timeout,
>  };
> @@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
>         BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
>                         UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
>
> +       BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
> +                       sizeof_field(struct io_uring_cmd, pdu));

Looks like Uday also suggested this, but if you change
ublk_get_uring_cmd_pdu() to use io_uring_cmd_to_pdu(), you get this
check for free.

Best,
Caleb


> +
>         init_waitqueue_head(&ublk_idr_wq);
>
>         ret = misc_register(&ublk_misc);
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] ublk: add segment parameter
  2025-03-26 16:43           ` Caleb Sander Mateos
@ 2025-03-27  1:49             ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-27  1:49 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Wed, Mar 26, 2025 at 09:43:13AM -0700, Caleb Sander Mateos wrote:
> On Tue, Mar 25, 2025 at 7:17 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Mar 25, 2025 at 12:43:26PM -0700, Caleb Sander Mateos wrote:
> > > On Mon, Mar 24, 2025 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> > > > > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > >
> > > > > > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > > > > > IO handling may have to fallback to io-wq, this way does hurt performance.
> > > > > >
> > > > > > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > > > > > split, ublk driver's segment limit should be aligned with backend
> > > > > > device's segment limit.
> > > > > >
> > > > > > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > > > > > which number is aligned with ublk request segment number, so that big
> > > > > > memory allocation can be avoided by setting reasonable max_segments limit.
> > > > > >
> > > > > > So add segment parameter for providing ublk server chance to align
> > > > > > segment limit with backend, and keep it reasonable from implementation
> > > > > > viewpoint.
> > > > > >
> > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > > ---
> > > > > >  drivers/block/ublk_drv.c      | 15 ++++++++++++++-
> > > > > >  include/uapi/linux/ublk_cmd.h |  9 +++++++++
> > > > > >  2 files changed, 23 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > > index acb6aed7be75..53a463681a41 100644
> > > > > > --- a/drivers/block/ublk_drv.c
> > > > > > +++ b/drivers/block/ublk_drv.c
> > > > > > @@ -74,7 +74,7 @@
> > > > > >  #define UBLK_PARAM_TYPE_ALL                                \
> > > > > >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > > > > >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > > > > > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > > > > > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> > > > > >
> > > > > >  struct ublk_rq_data {
> > > > > >         struct kref ref;
> > > > > > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > > > > >                         return -EINVAL;
> > > > > >         }
> > > > > >
> > > > > > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > > > > > +               const struct ublk_param_segment *p = &ub->params.seg;
> > > > > > +
> > > > > > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > > > > > +                       return -EINVAL;
> > > > >
> > > > > Looking at blk_validate_limits(), it seems like there are some
> > > > > additional requirements? Looks like seg_boundary_mask has to be at
> > > > > least PAGE_SIZE - 1
> > > >
> > > > Yeah, it isn't done in ublk because block layer runs the check, and it
> > > > will be failed when starting the device. That said we take block layer's
> > > > default setting, which isn't good from UAPI viewpoint, since block
> > > > layer may change the default setting.
> > >
> > > Even though blk_validate_limits() rejects it, it appears to log a
> > > warning. That seems undesirable for something controllable from
> > > userspace.
> > > /*
> > >  * By default there is no limit on the segment boundary alignment,
> > >  * but if there is one it can't be smaller than the page size as
> > >  * that would break all the normal I/O patterns.
> > >  */
> > > if (!lim->seg_boundary_mask)
> > >         lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> > > if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1))
> > >         return -EINVAL;
> >
> > Yes, it has been addressed in my local version, and we need to make it
> > a hw/sw interface.
> >
> > >
> > > >
> > > > Also it is bad to associate device property with PAGE_SIZE which is
> > > > a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
> > > > for segment limits.
> > > >
> > > > I think we can take 4096 for validation here.
> > > >
> > > > > and max_segment_size has to be at least PAGE_SIZE
> > > > > if virt_boundary_mask is set?
> > > >
> > > > If virt_boundary_mask is set, max_segment_size will be ignored usually
> > > > except for some stacking devices.
> > >
> > > Sorry, I had it backwards. The requirement is if virt_boundary_mask is
> > > *not* set:
> > > /*
> > >  * Stacking device may have both virtual boundary and max segment
> > >  * size limit, so allow this setting now, and long-term the two
> > >  * might need to move out of stacking limits since we have immutable
> > >  * bvec and lower layer bio splitting is supposed to handle the two
> > >  * correctly.
> > >  */
> > > if (lim->virt_boundary_mask) {
> > >         if (!lim->max_segment_size)
> > >                 lim->max_segment_size = UINT_MAX;
> > > } else {
> > >         /*
> > >          * The maximum segment size has an odd historic 64k default that
> > >          * drivers probably should override.  Just like the I/O size we
> > >          * require drivers to at least handle a full page per segment.
> > >          */
> > >         if (!lim->max_segment_size)
> > >                 lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
> > >         if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE))
> > >                 return -EINVAL;
> > > }
> >
> > Right.
> >
> > Please feel free to see if the revised patch is good:
> >
> >
> > From 0718b9f130b3bc9b9b06907c687fb5b9eea172f7 Mon Sep 17 00:00:00 2001
> > From: Ming Lei <ming.lei@redhat.com>
> > Date: Mon, 24 Mar 2025 12:33:59 +0000
> > Subject: [PATCH V2 3/8] ublk: add segment parameter
> >
> > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > IO handling may have to fallback to io-wq, this way does hurt performance.
> >
> > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > split, ublk driver's segment limit should be aligned with backend
> > device's segment limit.
> >
> > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > which number is aligned with ublk request segment number, so that big
> > memory allocation can be avoided by setting reasonable max_segments limit.
> >
> > So add segment parameter for providing ublk server chance to align
> > segment limit with backend, and keep it reasonable from implementation
> > viewpoint.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c      | 20 +++++++++++++++++++-
> >  include/uapi/linux/ublk_cmd.h | 21 +++++++++++++++++++++
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 6fa1384c6436..6367476cef2b 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -74,7 +74,7 @@
> >  #define UBLK_PARAM_TYPE_ALL                                \
> >         (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> >          UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED |    \
> > -        UBLK_PARAM_TYPE_DMA_ALIGN)
> > +        UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> >
> >  struct ublk_rq_data {
> >         struct kref ref;
> > @@ -580,6 +580,18 @@ static int ublk_validate_params(const struct ublk_device *ub)
> >                         return -EINVAL;
> >         }
> >
> > +       if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > +               const struct ublk_param_segment *p = &ub->params.seg;
> > +
> > +               if (!is_power_of_2(p->seg_boundary_mask + 1))
> > +                       return -EINVAL;
> > +
> > +               if (p->seg_boundary_mask + 1 < UBLK_MIN_SEGMENT_SIZE)
> > +                       return -EINVAL;
> > +               if (p->max_segment_size < UBLK_MIN_SEGMENT_SIZE)
> > +                       return -EINVAL;
> 
> These checks look good, except they don't allow omitting
> seg_boundary_mask or max_segment_size. I can imagine a ublk server
> might want to set only some of the segment limits and leave the others
> as 0?

Any unset field in `ublk_param_segment` should be undefined behavior(or either
rejected or one default value is taken), we can document it.

Thanks for raising the issue.

`seg_boundary_mask` is very similar with `max_segment_size`, so if either
one of the two are set, the other one should get one default value if it is unset.

If either one of the above two are set, 'max_segments' can't be zero.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] ublk: implement ->queue_rqs()
  2025-03-26 21:30   ` Caleb Sander Mateos
@ 2025-03-27  9:15     ` Ming Lei
  0 siblings, 0 replies; 26+ messages in thread
From: Ming Lei @ 2025-03-27  9:15 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Keith Busch, Uday Shankar

On Wed, Mar 26, 2025 at 02:30:56PM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Implement ->queue_rqs() for improving perf in case of MQ.
> >
> > In this way, we just need to call io_uring_cmd_complete_in_task() once for
> > one batch, then both io_uring and ublk server can get exact batch from
> > client side.
> >
> > Follows IOPS improvement:
> >
> > - tests
> >
> >         tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
> >
> >         fio/t/io_uring -p0 /dev/ublkb0
> >
> > - results:
> >
> >         more than 10% IOPS boost observed
> >
> > Pass all ublk selftests, especially the io dispatch order test.
> >
> > Cc: Uday Shankar <ushankar@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 77 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 53a463681a41..86621fde7fde 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -83,6 +83,7 @@ struct ublk_rq_data {
> >  struct ublk_uring_cmd_pdu {
> >         struct ublk_queue *ubq;
> >         u16 tag;
> > +       struct rq_list list;
> >  };
> >
> >  /*
> > @@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
> >         io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
> >  }
> >
> > +static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> > +               unsigned int issue_flags)
> > +{
> > +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > +       struct ublk_queue *ubq = pdu->ubq;
> > +       struct request *rq;
> > +
> > +       while ((rq = rq_list_pop(&pdu->list))) {
> > +               struct ublk_io *io = &ubq->ios[rq->tag];
> > +
> > +               ublk_rq_task_work_cb(io->cmd, issue_flags);
> 
> ublk_rq_task_work_cb() is duplicating the lookup of ubq, rq, and io.
> Could you factor out a helper that takes those values instead of cmd?
> 
> > +       }
> > +}
> > +
> > +static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> > +{
> > +       struct request *rq = l->head;
> > +       struct ublk_io *io = &ubq->ios[rq->tag];
> > +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
> > +
> > +       pdu->ubq = ubq;
> 
> Why does pdu->ubq need to be set here but not in ublk_queue_cmd()? I
> would have thought it would already be set to ubq because pdu comes
> from a rq belonging to this ubq.
> 
> > +       pdu->list = *l;
> > +       rq_list_init(l);
> > +       io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);
> 
> Could store io->cmd in a variable to avoid looking it up twice.
> 
> > +}
> > +
> >  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  {
> >         struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > @@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >         return BLK_EH_RESET_TIMER;
> >  }
> >
> > -static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > -               const struct blk_mq_queue_data *bd)
> > +static blk_status_t ublk_prep_rq_batch(struct request *rq)
> 
> naming nit: why "batch"?

All were handled in my local version.

> 
> >  {
> > -       struct ublk_queue *ubq = hctx->driver_data;
> > -       struct request *rq = bd->rq;
> > +       struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> >         blk_status_t res;
> >
> > -       if (unlikely(ubq->fail_io)) {
> > +       if (unlikely(ubq->fail_io))
> >                 return BLK_STS_TARGET;
> > -       }
> >
> >         /* fill iod to slot in io cmd buffer */
> >         res = ublk_setup_iod(ubq, rq);
> > @@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >         if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
> >                 return BLK_STS_IOERR;
> >
> > +       if (unlikely(ubq->canceling))
> > +               return BLK_STS_IOERR;
> 
> Why is ubq->cancelling treated differently for ->queue_rq() vs. ->queue_rqs()?

It is same, just ublk_queue_rqs() becomes simpler by letting ->queue_rqs()
to handle ubq->canceling.

Here it is really something which need to comment for ubq->canceling
handling, which has to be done after ->fail_io/->force_abort is dealt
with, otherwise IO hang is caused when removing disk.

That is one bug introduced in this patch.

> 
> > +
> > +       blk_mq_start_request(rq);
> > +       return BLK_STS_OK;
> > +}
> > +
> > +static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > +               const struct blk_mq_queue_data *bd)
> > +{
> > +       struct ublk_queue *ubq = hctx->driver_data;
> > +       struct request *rq = bd->rq;
> > +       blk_status_t res;
> > +
> >         if (unlikely(ubq->canceling)) {
> >                 __ublk_abort_rq(ubq, rq);
> >                 return BLK_STS_OK;
> >         }
> >
> > -       blk_mq_start_request(bd->rq);
> > -       ublk_queue_cmd(ubq, rq);
> > +       res = ublk_prep_rq_batch(rq);
> > +       if (res != BLK_STS_OK)
> > +               return res;
> >
> > +       ublk_queue_cmd(ubq, rq);
> >         return BLK_STS_OK;
> >  }
> >
> > +static void ublk_queue_rqs(struct rq_list *rqlist)
> > +{
> > +       struct rq_list requeue_list = { };
> > +       struct rq_list submit_list = { };
> > +       struct ublk_queue *ubq = NULL;
> > +       struct request *req;
> > +
> > +       while ((req = rq_list_pop(rqlist))) {
> > +               struct ublk_queue *this_q = req->mq_hctx->driver_data;
> > +
> > +               if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
> > +                       ublk_queue_cmd_list(ubq, &submit_list);
> > +               ubq = this_q;
> 
> Probably could avoid the extra ->driver_data dereference on every rq
> by comparing the mq_hctx pointers instead. The ->driver_data
> dereference could be moved to the ublk_queue_cmd_list() calls.

Yes.

> 
> > +
> > +               if (ublk_prep_rq_batch(req) == BLK_STS_OK)
> > +                       rq_list_add_tail(&submit_list, req);
> > +               else
> > +                       rq_list_add_tail(&requeue_list, req);
> > +       }
> > +
> > +       if (ubq && !rq_list_empty(&submit_list))
> > +               ublk_queue_cmd_list(ubq, &submit_list);
> > +       *rqlist = requeue_list;
> > +}
> > +
> >  static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >                 unsigned int hctx_idx)
> >  {
> > @@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >
> >  static const struct blk_mq_ops ublk_mq_ops = {
> >         .queue_rq       = ublk_queue_rq,
> > +       .queue_rqs      = ublk_queue_rqs,
> >         .init_hctx      = ublk_init_hctx,
> >         .timeout        = ublk_timeout,
> >  };
> > @@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
> >         BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
> >                         UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
> >
> > +       BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
> > +                       sizeof_field(struct io_uring_cmd, pdu));
> 
> Looks like Uday also suggested this, but if you change
> ublk_get_uring_cmd_pdu() to use io_uring_cmd_to_pdu(), you get this
> check for free.

I have followed Uday's suggestion to patch ublk_get_uring_cmd_pdu(),
and will send out v2 soon.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-03-27  9:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-03-24 14:32   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
2025-03-24 15:01   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
2025-03-24 15:51   ` Caleb Sander Mateos
2025-03-25  0:50     ` Ming Lei
2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
2025-03-24 22:26   ` Caleb Sander Mateos
2025-03-25  1:15     ` Ming Lei
2025-03-25 19:43       ` Caleb Sander Mateos
2025-03-26  2:17         ` Ming Lei
2025-03-26 16:43           ` Caleb Sander Mateos
2025-03-27  1:49             ` Ming Lei
2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
2025-03-25 15:26   ` Caleb Sander Mateos
2025-03-26  2:29     ` Ming Lei
2025-03-26 16:48       ` Caleb Sander Mateos
2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
2025-03-24 17:07   ` Uday Shankar
2025-03-25  1:02     ` Ming Lei
2025-03-26 21:30   ` Caleb Sander Mateos
2025-03-27  9:15     ` Ming Lei
2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).