public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
@ 2026-04-09 13:30 Ming Lei
  2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

Hello Jens,

Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
addressing review feedback from Caleb Sander Mateos.

- Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
  (the __u32 field overflowed to 0 for exactly 4GB)
- Verify all pages in multi-page bvecs fall within the registered maple
  tree range, removing base_pfn from ublk_buf_range since mas.index
  provides the range start PFN
- Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
- Replace xarray with IDA for buffer index allocation, removing the
  unnecessary struct ublk_buf
- Allow buffer registration before device is started by taking ub->mutex
  before freezing the queue (same ordering as ublk_stop_dev_unlocked)
- Address documentation review comments
- Update MAINTAINERS email

Thanks,

Ming Lei (7):
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: allow buffer registration before device is started
  Documentation: ublk: address review comments for SHMEM_ZC docs
  MAINTAINERS: update ublk driver maintainer email

 Documentation/block/ublk.rst  |  10 +-
 MAINTAINERS                   |   2 +-
 drivers/block/ublk_drv.c      | 201 ++++++++++++++++------------------
 include/uapi/linux/ublk_cmd.h |   3 +-
 4 files changed, 101 insertions(+), 115 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

The __u32 len field cannot represent a 4GB buffer (0x100000000
overflows to 0). Change it to __u64 so buffers up to 4GB can be
registered. Add a reserved field for alignment and validate it
is zero.

The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX)
which may be increased in future.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c      | 9 ++++++++-
 include/uapi/linux/ublk_cmd.h | 3 ++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2e475bdc54dd..ada9a2e32ea9 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -63,6 +63,9 @@
 #define UBLK_CMD_REG_BUF	_IOC_NR(UBLK_U_CMD_REG_BUF)
 #define UBLK_CMD_UNREG_BUF	_IOC_NR(UBLK_U_CMD_UNREG_BUF)
 
+/* Default max shmem buffer size: 4GB (may be increased in future) */
+#define UBLK_SHMEM_BUF_SIZE_MAX	(1ULL << 32)
+
 #define UBLK_IO_REGISTER_IO_BUF		_IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
 #define UBLK_IO_UNREGISTER_IO_BUF	_IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
 
@@ -5351,11 +5354,15 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 	if (buf_reg.flags & ~UBLK_SHMEM_BUF_READ_ONLY)
 		return -EINVAL;
 
+	if (buf_reg.reserved)
+		return -EINVAL;
+
 	addr = buf_reg.addr;
 	size = buf_reg.len;
 	nr_pages = size >> PAGE_SHIFT;
 
-	if (!size || !PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
+	if (!size || size > UBLK_SHMEM_BUF_SIZE_MAX ||
+	    !PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
 		return -EINVAL;
 
 	disk = ublk_get_disk(ub);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index ecd258847d3d..66d93efccd51 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -89,8 +89,9 @@
 /* Parameter buffer for UBLK_U_CMD_REG_BUF, pointed to by ctrl_cmd.addr */
 struct ublk_shmem_buf_reg {
 	__u64	addr;	/* userspace virtual address of shared memory */
-	__u32	len;	/* buffer size in bytes (page-aligned, max 4GB) */
+	__u64	len;	/* buffer size in bytes, page-aligned, default max 4GB */
 	__u32	flags;
+	__u32	reserved;
 };
 
 /* Pin pages without FOLL_WRITE; usable with write-sealed memfd */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
  2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

rq_for_each_bvec() yields multi-page bvecs where bv_page is only the
first page. ublk_try_buf_match() only validated the start PFN against
the maple tree, but a bvec can span multiple pages past the end of a
registered range.

Use mas_walk() instead of mtree_load() to obtain the range boundaries
stored in the maple tree, and check that the bvec's end PFN does not
exceed the range. Also remove base_pfn from struct ublk_buf_range
since mas.index already provides the range start PFN.

Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index ada9a2e32ea9..f990c10e963a 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -304,7 +304,6 @@ struct ublk_buf {
 
 /* Maple tree value: maps a PFN range to buffer location */
 struct ublk_buf_range {
-	unsigned long base_pfn;
 	unsigned short buf_index;
 	unsigned short flags;
 	unsigned int base_offset;	/* byte offset within buffer */
@@ -5306,7 +5305,6 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
 		}
 		range->buf_index = index;
 		range->flags = flags;
-		range->base_pfn = pfn;
 		range->base_offset = start << PAGE_SHIFT;
 
 		ret = mtree_insert_range(&ub->buf_tree, pfn,
@@ -5451,8 +5449,8 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
 		if (range->buf_index != buf_index)
 			continue;
 
-		base = range->base_pfn;
-		nr = mas.last - mas.index + 1;
+		base = mas.index;
+		nr = mas.last - base + 1;
 		mas_erase(&mas);
 
 		for (off = 0; off < nr; ) {
@@ -5531,15 +5529,22 @@ static bool ublk_try_buf_match(struct ublk_device *ub,
 
 	rq_for_each_bvec(bv, rq, iter) {
 		unsigned long pfn = page_to_pfn(bv.bv_page);
+		unsigned long end_pfn = pfn +
+			((bv.bv_offset + bv.bv_len - 1) >> PAGE_SHIFT);
 		struct ublk_buf_range *range;
 		unsigned long off;
+		MA_STATE(mas, &ub->buf_tree, pfn, pfn);
 
-		range = mtree_load(&ub->buf_tree, pfn);
+		range = mas_walk(&mas);
 		if (!range)
 			return false;
 
+		/* verify all pages in this bvec fall within the range */
+		if (end_pfn > mas.last)
+			return false;
+
 		off = range->base_offset +
-			(pfn - range->base_pfn) * PAGE_SIZE + bv.bv_offset;
+			(pfn - mas.index) * PAGE_SIZE + bv.bv_offset;
 
 		if (first) {
 			/* Read-only buffer can't serve READ (kernel writes) */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
  2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
  2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

Use the for-loop increment instead of a manual `i++` past the last
page, and fix the mtree_insert_range end key accordingly.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f990c10e963a..efbb22fe481c 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -5287,7 +5287,7 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
 	unsigned long i;
 	int ret;
 
-	for (i = 0; i < nr_pages; ) {
+	for (i = 0; i < nr_pages; i++) {
 		unsigned long pfn = page_to_pfn(pages[i]);
 		unsigned long start = i;
 		struct ublk_buf_range *range;
@@ -5296,7 +5296,6 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
 		while (i + 1 < nr_pages &&
 		       page_to_pfn(pages[i + 1]) == pfn + (i - start) + 1)
 			i++;
-		i++;	/* past the last page in this run */
 
 		range = kzalloc(sizeof(*range), GFP_KERNEL);
 		if (!range) {
@@ -5308,7 +5307,7 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
 		range->base_offset = start << PAGE_SHIFT;
 
 		ret = mtree_insert_range(&ub->buf_tree, pfn,
-					 pfn + (i - start) - 1,
+					 pfn + (i - start),
 					 range, GFP_KERNEL);
 		if (ret) {
 			kfree(range);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (2 preceding siblings ...)
  2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

Remove struct ublk_buf which only contained nr_pages that was never
read after registration. Use IDA for pure index allocation instead
of xarray. Make __ublk_ctrl_unreg_buf() return int so the caller
can detect invalid index without a separate lookup.

Simplify ublk_buf_cleanup() to walk the maple tree directly and
unpin all pages in one pass, instead of iterating the xarray by
buffer index.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c | 92 ++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 46 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index efbb22fe481c..8b686e70cf28 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -297,11 +297,6 @@ struct ublk_queue {
 	struct ublk_io ios[] __counted_by(q_depth);
 };
 
-/* Per-registered shared memory buffer */
-struct ublk_buf {
-	unsigned int nr_pages;
-};
-
 /* Maple tree value: maps a PFN range to buffer location */
 struct ublk_buf_range {
 	unsigned short buf_index;
@@ -345,7 +340,7 @@ struct ublk_device {
 
 	/* shared memory zero copy */
 	struct maple_tree	buf_tree;
-	struct xarray		bufs_xa;
+	struct ida		buf_ida;
 
 	struct ublk_queue       *queues[];
 };
@@ -4693,7 +4688,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
 	spin_lock_init(&ub->lock);
 	mutex_init(&ub->cancel_mutex);
 	mt_init(&ub->buf_tree);
-	xa_init_flags(&ub->bufs_xa, XA_FLAGS_ALLOC);
+	ida_init(&ub->buf_ida);
 	INIT_WORK(&ub->partition_scan_work, ublk_partition_scan_work);
 
 	ret = ublk_alloc_dev_number(ub, header->dev_id);
@@ -5279,11 +5274,9 @@ static void ublk_buf_erase_ranges(struct ublk_device *ub, int buf_index)
 }
 
 static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
-			       struct ublk_buf *ubuf,
-			       struct page **pages, int index,
-			       unsigned short flags)
+			       struct page **pages, unsigned long nr_pages,
+			       int index, unsigned short flags)
 {
-	unsigned long nr_pages = ubuf->nr_pages;
 	unsigned long i;
 	int ret;
 
@@ -5335,9 +5328,8 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 	struct page **pages = NULL;
 	unsigned int gup_flags;
 	struct gendisk *disk;
-	struct ublk_buf *ubuf;
 	long pinned;
-	u32 index;
+	int index;
 	int ret;
 
 	if (!ublk_dev_support_shmem_zc(ub))
@@ -5367,16 +5359,10 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 		return -ENODEV;
 
 	/* Pin pages before quiescing (may sleep) */
-	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
-	if (!ubuf) {
-		ret = -ENOMEM;
-		goto put_disk;
-	}
-
 	pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
 	if (!pages) {
 		ret = -ENOMEM;
-		goto err_free;
+		goto put_disk;
 	}
 
 	gup_flags = FOLL_LONGTERM;
@@ -5392,7 +5378,6 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 		ret = -EFAULT;
 		goto err_unpin;
 	}
-	ubuf->nr_pages = nr_pages;
 
 	/*
 	 * Drain inflight I/O and quiesce the queue so no new requests
@@ -5403,13 +5388,15 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 
 	mutex_lock(&ub->mutex);
 
-	ret = xa_alloc(&ub->bufs_xa, &index, ubuf, xa_limit_16b, GFP_KERNEL);
-	if (ret)
+	index = ida_alloc_max(&ub->buf_ida, USHRT_MAX, GFP_KERNEL);
+	if (index < 0) {
+		ret = index;
 		goto err_unlock;
+	}
 
-	ret = __ublk_ctrl_reg_buf(ub, ubuf, pages, index, buf_reg.flags);
+	ret = __ublk_ctrl_reg_buf(ub, pages, nr_pages, index, buf_reg.flags);
 	if (ret) {
-		xa_erase(&ub->bufs_xa, index);
+		ida_free(&ub->buf_ida, index);
 		goto err_unlock;
 	}
 
@@ -5427,19 +5414,17 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 	unpin_user_pages(pages, pinned);
 err_free_pages:
 	kvfree(pages);
-err_free:
-	kfree(ubuf);
 put_disk:
 	ublk_put_disk(disk);
 	return ret;
 }
 
-static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
-				  struct ublk_buf *ubuf, int buf_index)
+static int __ublk_ctrl_unreg_buf(struct ublk_device *ub, int buf_index)
 {
 	MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX);
 	struct ublk_buf_range *range;
 	struct page *pages[32];
+	int ret = -ENOENT;
 
 	mas_lock(&mas);
 	mas_for_each(&mas, range, ULONG_MAX) {
@@ -5448,6 +5433,7 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
 		if (range->buf_index != buf_index)
 			continue;
 
+		ret = 0;
 		base = mas.index;
 		nr = mas.last - base + 1;
 		mas_erase(&mas);
@@ -5465,7 +5451,8 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
 		kfree(range);
 	}
 	mas_unlock(&mas);
-	kfree(ubuf);
+
+	return ret;
 }
 
 static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
@@ -5473,11 +5460,14 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
 {
 	int index = (int)header->data[0];
 	struct gendisk *disk;
-	struct ublk_buf *ubuf;
+	int ret;
 
 	if (!ublk_dev_support_shmem_zc(ub))
 		return -EOPNOTSUPP;
 
+	if (index < 0 || index > USHRT_MAX)
+		return -EINVAL;
+
 	disk = ublk_get_disk(ub);
 	if (!disk)
 		return -ENODEV;
@@ -5487,32 +5477,42 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
 
 	mutex_lock(&ub->mutex);
 
-	ubuf = xa_erase(&ub->bufs_xa, index);
-	if (!ubuf) {
-		mutex_unlock(&ub->mutex);
-		ublk_unquiesce_and_resume(disk);
-		ublk_put_disk(disk);
-		return -ENOENT;
-	}
-
-	__ublk_ctrl_unreg_buf(ub, ubuf, index);
+	ret = __ublk_ctrl_unreg_buf(ub, index);
+	if (!ret)
+		ida_free(&ub->buf_ida, index);
 
 	mutex_unlock(&ub->mutex);
 
 	ublk_unquiesce_and_resume(disk);
 	ublk_put_disk(disk);
-	return 0;
+	return ret;
 }
 
 static void ublk_buf_cleanup(struct ublk_device *ub)
 {
-	struct ublk_buf *ubuf;
-	unsigned long index;
+	MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX);
+	struct ublk_buf_range *range;
+	struct page *pages[32];
+
+	mas_for_each(&mas, range, ULONG_MAX) {
+		unsigned long base = mas.index;
+		unsigned long nr = mas.last - base + 1;
+		unsigned long off;
 
-	xa_for_each(&ub->bufs_xa, index, ubuf)
-		__ublk_ctrl_unreg_buf(ub, ubuf, index);
-	xa_destroy(&ub->bufs_xa);
+		for (off = 0; off < nr; ) {
+			unsigned int batch = min_t(unsigned long,
+						   nr - off, 32);
+			unsigned int j;
+
+			for (j = 0; j < batch; j++)
+				pages[j] = pfn_to_page(base + off + j);
+			unpin_user_pages(pages, batch);
+			off += batch;
+		}
+		kfree(range);
+	}
 	mtree_destroy(&ub->buf_tree);
+	ida_destroy(&ub->buf_ida);
 }
 
 /* Check if request pages match a registered shared memory buffer */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/7] ublk: allow buffer registration before device is started
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (3 preceding siblings ...)
  2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

Before START_DEV, there is no disk, no queue, no I/O dispatch, so
the maple tree can be safely modified under ub->mutex alone without
freezing the queue.

Add ublk_lock_buf_tree()/ublk_unlock_buf_tree() helpers that take
ub->mutex first, then freeze the queue if device is started. This
ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked()
already holds ub->mutex when calling del_gendisk() which freezes
the queue.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/ublk_drv.c | 82 +++++++++++++---------------------------
 1 file changed, 27 insertions(+), 55 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 8b686e70cf28..79178f13f198 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -5233,30 +5233,31 @@ static int ublk_char_dev_permission(struct ublk_device *ub,
 }
 
 /*
- * Drain inflight I/O and quiesce the queue. Freeze drains all inflight
- * requests, quiesce_nowait marks the queue so no new requests dispatch,
- * then unfreeze allows new submissions (which won't dispatch due to
- * quiesce). This keeps freeze and ub->mutex non-nested.
+ * Lock for maple tree modification: acquire ub->mutex, then freeze queue
+ * if device is started. If device is not yet started, only mutex is
+ * needed since no I/O path can access the tree.
+ *
+ * This ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked()
+ * already holds ub->mutex when calling del_gendisk() which freezes the queue.
  */
-static void ublk_quiesce_and_release(struct gendisk *disk)
+static unsigned int ublk_lock_buf_tree(struct ublk_device *ub)
 {
-	unsigned int memflags;
+	unsigned int memflags = 0;
 
-	memflags = blk_mq_freeze_queue(disk->queue);
-	blk_mq_quiesce_queue_nowait(disk->queue);
-	blk_mq_unfreeze_queue(disk->queue, memflags);
+	mutex_lock(&ub->mutex);
+	if (ub->ub_disk)
+		memflags = blk_mq_freeze_queue(ub->ub_disk->queue);
+
+	return memflags;
 }
 
-static void ublk_unquiesce_and_resume(struct gendisk *disk)
+static void ublk_unlock_buf_tree(struct ublk_device *ub, unsigned int memflags)
 {
-	blk_mq_unquiesce_queue(disk->queue);
+	if (ub->ub_disk)
+		blk_mq_unfreeze_queue(ub->ub_disk->queue, memflags);
+	mutex_unlock(&ub->mutex);
 }
 
-/*
- * Insert PFN ranges of a registered buffer into the maple tree,
- * coalescing consecutive PFNs into single range entries.
- * Returns 0 on success, negative error with partial insertions unwound.
- */
 /* Erase coalesced PFN ranges from the maple tree matching buf_index */
 static void ublk_buf_erase_ranges(struct ublk_device *ub, int buf_index)
 {
@@ -5327,7 +5328,7 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 	unsigned long addr, size, nr_pages;
 	struct page **pages = NULL;
 	unsigned int gup_flags;
-	struct gendisk *disk;
+	unsigned int memflags;
 	long pinned;
 	int index;
 	int ret;
@@ -5354,16 +5355,10 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 	    !PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
 		return -EINVAL;
 
-	disk = ublk_get_disk(ub);
-	if (!disk)
-		return -ENODEV;
-
-	/* Pin pages before quiescing (may sleep) */
+	/* Pin pages before any locks (may sleep) */
 	pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
-	if (!pages) {
-		ret = -ENOMEM;
-		goto put_disk;
-	}
+	if (!pages)
+		return -ENOMEM;
 
 	gup_flags = FOLL_LONGTERM;
 	if (!(buf_reg.flags & UBLK_SHMEM_BUF_READ_ONLY))
@@ -5379,14 +5374,7 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 		goto err_unpin;
 	}
 
-	/*
-	 * Drain inflight I/O and quiesce the queue so no new requests
-	 * are dispatched while we modify the maple tree. Keep freeze
-	 * and mutex non-nested to avoid lock dependency.
-	 */
-	ublk_quiesce_and_release(disk);
-
-	mutex_lock(&ub->mutex);
+	memflags = ublk_lock_buf_tree(ub);
 
 	index = ida_alloc_max(&ub->buf_ida, USHRT_MAX, GFP_KERNEL);
 	if (index < 0) {
@@ -5400,22 +5388,16 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
 		goto err_unlock;
 	}
 
-	mutex_unlock(&ub->mutex);
-
+	ublk_unlock_buf_tree(ub, memflags);
 	kvfree(pages);
-	ublk_unquiesce_and_resume(disk);
-	ublk_put_disk(disk);
 	return index;
 
 err_unlock:
-	mutex_unlock(&ub->mutex);
-	ublk_unquiesce_and_resume(disk);
+	ublk_unlock_buf_tree(ub, memflags);
 err_unpin:
 	unpin_user_pages(pages, pinned);
 err_free_pages:
 	kvfree(pages);
-put_disk:
-	ublk_put_disk(disk);
 	return ret;
 }
 
@@ -5459,7 +5441,7 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
 			       struct ublksrv_ctrl_cmd *header)
 {
 	int index = (int)header->data[0];
-	struct gendisk *disk;
+	unsigned int memflags;
 	int ret;
 
 	if (!ublk_dev_support_shmem_zc(ub))
@@ -5468,23 +5450,13 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
 	if (index < 0 || index > USHRT_MAX)
 		return -EINVAL;
 
-	disk = ublk_get_disk(ub);
-	if (!disk)
-		return -ENODEV;
-
-	/* Drain inflight I/O before modifying the maple tree */
-	ublk_quiesce_and_release(disk);
-
-	mutex_lock(&ub->mutex);
+	memflags = ublk_lock_buf_tree(ub);
 
 	ret = __ublk_ctrl_unreg_buf(ub, index);
 	if (!ret)
 		ida_free(&ub->buf_ida, index);
 
-	mutex_unlock(&ub->mutex);
-
-	ublk_unquiesce_and_resume(disk);
-	ublk_put_disk(disk);
+	ublk_unlock_buf_tree(ub, memflags);
 	return ret;
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (4 preceding siblings ...)
  2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

- Use "physical pages" instead of "page frame numbers (PFNs)" for
  clarity
- Remove "without any per-I/O overhead" claim from zero-copy
  description
- Add scatter/gather limitation: each I/O's data must be contiguous
  within a single registered buffer

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 Documentation/block/ublk.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index a818e09a4b66..c39d111af2d2 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -492,8 +492,8 @@ The ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path
 that works by sharing physical memory pages between the client application
 and the ublk server. Unlike the io_uring fixed buffer approach above,
 shared memory zero copy does not require io_uring buffer registration
-per I/O — instead, it relies on the kernel matching page frame numbers
-(PFNs) at I/O time. This allows the ublk server to access the shared
+per I/O — instead, it relies on the kernel matching physical pages
+at I/O time. This allows the ublk server to access the shared
 buffer directly, which is unlikely for the io_uring fixed buffer
 approach.
 
@@ -507,8 +507,7 @@ tells the server where the data already lives.
 
 ``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client
 applications — when the client is willing to allocate I/O buffers from
-shared memory, the entire data path becomes zero-copy without any per-I/O
-overhead.
+shared memory, the entire data path becomes zero-copy.
 
 Use Cases
 ~~~~~~~~~
@@ -584,6 +583,9 @@ Limitations
   the page cache, which allocates its own pages. These kernel-allocated
   pages will never match the registered shared buffer. Only ``O_DIRECT``
   puts the client's buffer pages directly into the block I/O.
+- **Contiguous data only**: each I/O request's data must be contiguous
+  within a single registered buffer. Scatter/gather I/O that spans
+  multiple non-adjacent registered buffers cannot use the zero-copy path.
 
 Control Commands
 ~~~~~~~~~~~~~~~~
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (5 preceding siblings ...)
  2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
  2026-04-10  1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
  2026-04-10  1:12 ` Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei

Update the ublk userspace block driver maintainer email address
from ming.lei@redhat.com to tom.leiming@gmail.com as the original
email will become invalid.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77fdfcb55f06..4abb3345bc4e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26992,7 +26992,7 @@ F:	Documentation/filesystems/ubifs.rst
 F:	fs/ubifs/
 
 UBLK USERSPACE BLOCK DRIVER
-M:	Ming Lei <ming.lei@redhat.com>
+M:	Ming Lei <tom.leiming@gmail.com>
 L:	linux-block@vger.kernel.org
 S:	Maintained
 F:	Documentation/block/ublk.rst
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (6 preceding siblings ...)
  2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
@ 2026-04-10  1:11 ` Jens Axboe
  2026-04-10  1:12 ` Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-04-10  1:11 UTC (permalink / raw)
  To: linux-block, Ming Lei; +Cc: Caleb Sander Mateos


On Thu, 09 Apr 2026 21:30:12 +0800, Ming Lei wrote:
> Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
> addressing review feedback from Caleb Sander Mateos.
> 
> - Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
>   (the __u32 field overflowed to 0 for exactly 4GB)
> - Verify all pages in multi-page bvecs fall within the registered maple
>   tree range, removing base_pfn from ublk_buf_range since mas.index
>   provides the range start PFN
> - Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
> - Replace xarray with IDA for buffer index allocation, removing the
>   unnecessary struct ublk_buf
> - Allow buffer registration before device is started by taking ub->mutex
>   before freezing the queue (same ordering as ublk_stop_dev_unlocked)
> - Address documentation review comments
> - Update MAINTAINERS email
> 
> [...]

Applied, thanks!

[1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
      commit: 23b3b6f0b584b70a427d5bb826d320151890d7da
[2/7] ublk: verify all pages in multi-page bvec fall within registered range
      commit: 211ff1602b67e26125977f8b2f369d7c2847628c
[3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
      commit: 8ea8566a9aeef746699d8c84bed3ac44edbfaa0e
[4/7] ublk: replace xarray with IDA for shmem buffer index allocation
      commit: 5e864438e2853ef5112d7905fadcc3877e2be70a
[5/7] ublk: allow buffer registration before device is started
      commit: 365ea7cc62447caac508706b429cdf031cc15a9f
[6/7] Documentation: ublk: address review comments for SHMEM_ZC docs
      commit: 289653bb76c46149f88939c3cfef55cdb236ace2
[7/7] MAINTAINERS: update ublk driver maintainer email
      commit: b774765fb804045ee774476ded8e52482ae5ecb7

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
  2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
                   ` (7 preceding siblings ...)
  2026-04-10  1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
@ 2026-04-10  1:12 ` Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-04-10  1:12 UTC (permalink / raw)
  To: Ming Lei, linux-block; +Cc: Caleb Sander Mateos

On 4/9/26 7:30 AM, Ming Lei wrote:
> Hello Jens,
> 
> Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
> addressing review feedback from Caleb Sander Mateos.
> 
> - Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
>   (the __u32 field overflowed to 0 for exactly 4GB)
> - Verify all pages in multi-page bvecs fall within the registered maple
>   tree range, removing base_pfn from ublk_buf_range since mas.index
>   provides the range start PFN
> - Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
> - Replace xarray with IDA for buffer index allocation, removing the
>   unnecessary struct ublk_buf
> - Allow buffer registration before device is started by taking ub->mutex
>   before freezing the queue (same ordering as ublk_stop_dev_unlocked)
> - Address documentation review comments
> - Update MAINTAINERS email

Applied, but I'm unsure what base you used, I'm guessing your old base
rather than my for-7.1/block which already had fixups for the issues I
mentioned? In any case, just ensure that future updates are against the
actual tree, not on top of your previous tree.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-10  1:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
2026-04-10  1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
2026-04-10  1:12 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox