* [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
@ 2026-04-09 13:30 Ming Lei
2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
Hello Jens,
Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
addressing review feedback from Caleb Sander Mateos.
- Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
(the __u32 field overflowed to 0 for exactly 4GB)
- Verify all pages in multi-page bvecs fall within the registered maple
tree range, removing base_pfn from ublk_buf_range since mas.index
provides the range start PFN
- Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
- Replace xarray with IDA for buffer index allocation, removing the
unnecessary struct ublk_buf
- Allow buffer registration before device is started by taking ub->mutex
before freezing the queue (same ordering as ublk_stop_dev_unlocked)
- Address documentation review comments
- Update MAINTAINERS email
Thanks,
Ming Lei (7):
ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
ublk: verify all pages in multi-page bvec fall within registered range
ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
ublk: replace xarray with IDA for shmem buffer index allocation
ublk: allow buffer registration before device is started
Documentation: ublk: address review comments for SHMEM_ZC docs
MAINTAINERS: update ublk driver maintainer email
Documentation/block/ublk.rst | 10 +-
MAINTAINERS | 2 +-
drivers/block/ublk_drv.c | 201 ++++++++++++++++------------------
include/uapi/linux/ublk_cmd.h | 3 +-
4 files changed, 101 insertions(+), 115 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
The __u32 len field cannot represent a 4GB buffer (0x100000000
overflows to 0). Change it to __u64 so buffers up to 4GB can be
registered. Add a reserved field for alignment and validate it
is zero.
The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX)
which may be increased in future.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 9 ++++++++-
include/uapi/linux/ublk_cmd.h | 3 ++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2e475bdc54dd..ada9a2e32ea9 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -63,6 +63,9 @@
#define UBLK_CMD_REG_BUF _IOC_NR(UBLK_U_CMD_REG_BUF)
#define UBLK_CMD_UNREG_BUF _IOC_NR(UBLK_U_CMD_UNREG_BUF)
+/* Default max shmem buffer size: 4GB (may be increased in future) */
+#define UBLK_SHMEM_BUF_SIZE_MAX (1ULL << 32)
+
#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
@@ -5351,11 +5354,15 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
if (buf_reg.flags & ~UBLK_SHMEM_BUF_READ_ONLY)
return -EINVAL;
+ if (buf_reg.reserved)
+ return -EINVAL;
+
addr = buf_reg.addr;
size = buf_reg.len;
nr_pages = size >> PAGE_SHIFT;
- if (!size || !PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
+ if (!size || size > UBLK_SHMEM_BUF_SIZE_MAX ||
+ !PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
return -EINVAL;
disk = ublk_get_disk(ub);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index ecd258847d3d..66d93efccd51 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -89,8 +89,9 @@
/* Parameter buffer for UBLK_U_CMD_REG_BUF, pointed to by ctrl_cmd.addr */
struct ublk_shmem_buf_reg {
__u64 addr; /* userspace virtual address of shared memory */
- __u32 len; /* buffer size in bytes (page-aligned, max 4GB) */
+ __u64 len; /* buffer size in bytes, page-aligned, default max 4GB */
__u32 flags;
+ __u32 reserved;
};
/* Pin pages without FOLL_WRITE; usable with write-sealed memfd */
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
rq_for_each_bvec() yields multi-page bvecs where bv_page is only the
first page. ublk_try_buf_match() only validated the start PFN against
the maple tree, but a bvec can span multiple pages past the end of a
registered range.
Use mas_walk() instead of mtree_load() to obtain the range boundaries
stored in the maple tree, and check that the bvec's end PFN does not
exceed the range. Also remove base_pfn from struct ublk_buf_range
since mas.index already provides the range start PFN.
Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index ada9a2e32ea9..f990c10e963a 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -304,7 +304,6 @@ struct ublk_buf {
/* Maple tree value: maps a PFN range to buffer location */
struct ublk_buf_range {
- unsigned long base_pfn;
unsigned short buf_index;
unsigned short flags;
unsigned int base_offset; /* byte offset within buffer */
@@ -5306,7 +5305,6 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
}
range->buf_index = index;
range->flags = flags;
- range->base_pfn = pfn;
range->base_offset = start << PAGE_SHIFT;
ret = mtree_insert_range(&ub->buf_tree, pfn,
@@ -5451,8 +5449,8 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
if (range->buf_index != buf_index)
continue;
- base = range->base_pfn;
- nr = mas.last - mas.index + 1;
+ base = mas.index;
+ nr = mas.last - base + 1;
mas_erase(&mas);
for (off = 0; off < nr; ) {
@@ -5531,15 +5529,22 @@ static bool ublk_try_buf_match(struct ublk_device *ub,
rq_for_each_bvec(bv, rq, iter) {
unsigned long pfn = page_to_pfn(bv.bv_page);
+ unsigned long end_pfn = pfn +
+ ((bv.bv_offset + bv.bv_len - 1) >> PAGE_SHIFT);
struct ublk_buf_range *range;
unsigned long off;
+ MA_STATE(mas, &ub->buf_tree, pfn, pfn);
- range = mtree_load(&ub->buf_tree, pfn);
+ range = mas_walk(&mas);
if (!range)
return false;
+ /* verify all pages in this bvec fall within the range */
+ if (end_pfn > mas.last)
+ return false;
+
off = range->base_offset +
- (pfn - range->base_pfn) * PAGE_SIZE + bv.bv_offset;
+ (pfn - mas.index) * PAGE_SIZE + bv.bv_offset;
if (first) {
/* Read-only buffer can't serve READ (kernel writes) */
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
Use the for-loop increment instead of a manual `i++` past the last
page, and fix the mtree_insert_range end key accordingly.
Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f990c10e963a..efbb22fe481c 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -5287,7 +5287,7 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
unsigned long i;
int ret;
- for (i = 0; i < nr_pages; ) {
+ for (i = 0; i < nr_pages; i++) {
unsigned long pfn = page_to_pfn(pages[i]);
unsigned long start = i;
struct ublk_buf_range *range;
@@ -5296,7 +5296,6 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
while (i + 1 < nr_pages &&
page_to_pfn(pages[i + 1]) == pfn + (i - start) + 1)
i++;
- i++; /* past the last page in this run */
range = kzalloc(sizeof(*range), GFP_KERNEL);
if (!range) {
@@ -5308,7 +5307,7 @@ static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
range->base_offset = start << PAGE_SHIFT;
ret = mtree_insert_range(&ub->buf_tree, pfn,
- pfn + (i - start) - 1,
+ pfn + (i - start),
range, GFP_KERNEL);
if (ret) {
kfree(range);
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (2 preceding siblings ...)
2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
Remove struct ublk_buf which only contained nr_pages that was never
read after registration. Use IDA for pure index allocation instead
of xarray. Make __ublk_ctrl_unreg_buf() return int so the caller
can detect invalid index without a separate lookup.
Simplify ublk_buf_cleanup() to walk the maple tree directly and
unpin all pages in one pass, instead of iterating the xarray by
buffer index.
Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 92 ++++++++++++++++++++--------------------
1 file changed, 46 insertions(+), 46 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index efbb22fe481c..8b686e70cf28 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -297,11 +297,6 @@ struct ublk_queue {
struct ublk_io ios[] __counted_by(q_depth);
};
-/* Per-registered shared memory buffer */
-struct ublk_buf {
- unsigned int nr_pages;
-};
-
/* Maple tree value: maps a PFN range to buffer location */
struct ublk_buf_range {
unsigned short buf_index;
@@ -345,7 +340,7 @@ struct ublk_device {
/* shared memory zero copy */
struct maple_tree buf_tree;
- struct xarray bufs_xa;
+ struct ida buf_ida;
struct ublk_queue *queues[];
};
@@ -4693,7 +4688,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
spin_lock_init(&ub->lock);
mutex_init(&ub->cancel_mutex);
mt_init(&ub->buf_tree);
- xa_init_flags(&ub->bufs_xa, XA_FLAGS_ALLOC);
+ ida_init(&ub->buf_ida);
INIT_WORK(&ub->partition_scan_work, ublk_partition_scan_work);
ret = ublk_alloc_dev_number(ub, header->dev_id);
@@ -5279,11 +5274,9 @@ static void ublk_buf_erase_ranges(struct ublk_device *ub, int buf_index)
}
static int __ublk_ctrl_reg_buf(struct ublk_device *ub,
- struct ublk_buf *ubuf,
- struct page **pages, int index,
- unsigned short flags)
+ struct page **pages, unsigned long nr_pages,
+ int index, unsigned short flags)
{
- unsigned long nr_pages = ubuf->nr_pages;
unsigned long i;
int ret;
@@ -5335,9 +5328,8 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
struct page **pages = NULL;
unsigned int gup_flags;
struct gendisk *disk;
- struct ublk_buf *ubuf;
long pinned;
- u32 index;
+ int index;
int ret;
if (!ublk_dev_support_shmem_zc(ub))
@@ -5367,16 +5359,10 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
return -ENODEV;
/* Pin pages before quiescing (may sleep) */
- ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
- if (!ubuf) {
- ret = -ENOMEM;
- goto put_disk;
- }
-
pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
if (!pages) {
ret = -ENOMEM;
- goto err_free;
+ goto put_disk;
}
gup_flags = FOLL_LONGTERM;
@@ -5392,7 +5378,6 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
ret = -EFAULT;
goto err_unpin;
}
- ubuf->nr_pages = nr_pages;
/*
* Drain inflight I/O and quiesce the queue so no new requests
@@ -5403,13 +5388,15 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
mutex_lock(&ub->mutex);
- ret = xa_alloc(&ub->bufs_xa, &index, ubuf, xa_limit_16b, GFP_KERNEL);
- if (ret)
+ index = ida_alloc_max(&ub->buf_ida, USHRT_MAX, GFP_KERNEL);
+ if (index < 0) {
+ ret = index;
goto err_unlock;
+ }
- ret = __ublk_ctrl_reg_buf(ub, ubuf, pages, index, buf_reg.flags);
+ ret = __ublk_ctrl_reg_buf(ub, pages, nr_pages, index, buf_reg.flags);
if (ret) {
- xa_erase(&ub->bufs_xa, index);
+ ida_free(&ub->buf_ida, index);
goto err_unlock;
}
@@ -5427,19 +5414,17 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
unpin_user_pages(pages, pinned);
err_free_pages:
kvfree(pages);
-err_free:
- kfree(ubuf);
put_disk:
ublk_put_disk(disk);
return ret;
}
-static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
- struct ublk_buf *ubuf, int buf_index)
+static int __ublk_ctrl_unreg_buf(struct ublk_device *ub, int buf_index)
{
MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX);
struct ublk_buf_range *range;
struct page *pages[32];
+ int ret = -ENOENT;
mas_lock(&mas);
mas_for_each(&mas, range, ULONG_MAX) {
@@ -5448,6 +5433,7 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
if (range->buf_index != buf_index)
continue;
+ ret = 0;
base = mas.index;
nr = mas.last - base + 1;
mas_erase(&mas);
@@ -5465,7 +5451,8 @@ static void __ublk_ctrl_unreg_buf(struct ublk_device *ub,
kfree(range);
}
mas_unlock(&mas);
- kfree(ubuf);
+
+ return ret;
}
static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
@@ -5473,11 +5460,14 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
{
int index = (int)header->data[0];
struct gendisk *disk;
- struct ublk_buf *ubuf;
+ int ret;
if (!ublk_dev_support_shmem_zc(ub))
return -EOPNOTSUPP;
+ if (index < 0 || index > USHRT_MAX)
+ return -EINVAL;
+
disk = ublk_get_disk(ub);
if (!disk)
return -ENODEV;
@@ -5487,32 +5477,42 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
mutex_lock(&ub->mutex);
- ubuf = xa_erase(&ub->bufs_xa, index);
- if (!ubuf) {
- mutex_unlock(&ub->mutex);
- ublk_unquiesce_and_resume(disk);
- ublk_put_disk(disk);
- return -ENOENT;
- }
-
- __ublk_ctrl_unreg_buf(ub, ubuf, index);
+ ret = __ublk_ctrl_unreg_buf(ub, index);
+ if (!ret)
+ ida_free(&ub->buf_ida, index);
mutex_unlock(&ub->mutex);
ublk_unquiesce_and_resume(disk);
ublk_put_disk(disk);
- return 0;
+ return ret;
}
static void ublk_buf_cleanup(struct ublk_device *ub)
{
- struct ublk_buf *ubuf;
- unsigned long index;
+ MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX);
+ struct ublk_buf_range *range;
+ struct page *pages[32];
+
+ mas_for_each(&mas, range, ULONG_MAX) {
+ unsigned long base = mas.index;
+ unsigned long nr = mas.last - base + 1;
+ unsigned long off;
- xa_for_each(&ub->bufs_xa, index, ubuf)
- __ublk_ctrl_unreg_buf(ub, ubuf, index);
- xa_destroy(&ub->bufs_xa);
+ for (off = 0; off < nr; ) {
+ unsigned int batch = min_t(unsigned long,
+ nr - off, 32);
+ unsigned int j;
+
+ for (j = 0; j < batch; j++)
+ pages[j] = pfn_to_page(base + off + j);
+ unpin_user_pages(pages, batch);
+ off += batch;
+ }
+ kfree(range);
+ }
mtree_destroy(&ub->buf_tree);
+ ida_destroy(&ub->buf_ida);
}
/* Check if request pages match a registered shared memory buffer */
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/7] ublk: allow buffer registration before device is started
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (3 preceding siblings ...)
2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
Before START_DEV, there is no disk, no queue, no I/O dispatch, so
the maple tree can be safely modified under ub->mutex alone without
freezing the queue.
Add ublk_lock_buf_tree()/ublk_unlock_buf_tree() helpers that take
ub->mutex first, then freeze the queue if device is started. This
ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked()
already holds ub->mutex when calling del_gendisk() which freezes
the queue.
Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 82 +++++++++++++---------------------------
1 file changed, 27 insertions(+), 55 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 8b686e70cf28..79178f13f198 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -5233,30 +5233,31 @@ static int ublk_char_dev_permission(struct ublk_device *ub,
}
/*
- * Drain inflight I/O and quiesce the queue. Freeze drains all inflight
- * requests, quiesce_nowait marks the queue so no new requests dispatch,
- * then unfreeze allows new submissions (which won't dispatch due to
- * quiesce). This keeps freeze and ub->mutex non-nested.
+ * Lock for maple tree modification: acquire ub->mutex, then freeze queue
+ * if device is started. If device is not yet started, only mutex is
+ * needed since no I/O path can access the tree.
+ *
+ * This ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked()
+ * already holds ub->mutex when calling del_gendisk() which freezes the queue.
*/
-static void ublk_quiesce_and_release(struct gendisk *disk)
+static unsigned int ublk_lock_buf_tree(struct ublk_device *ub)
{
- unsigned int memflags;
+ unsigned int memflags = 0;
- memflags = blk_mq_freeze_queue(disk->queue);
- blk_mq_quiesce_queue_nowait(disk->queue);
- blk_mq_unfreeze_queue(disk->queue, memflags);
+ mutex_lock(&ub->mutex);
+ if (ub->ub_disk)
+ memflags = blk_mq_freeze_queue(ub->ub_disk->queue);
+
+ return memflags;
}
-static void ublk_unquiesce_and_resume(struct gendisk *disk)
+static void ublk_unlock_buf_tree(struct ublk_device *ub, unsigned int memflags)
{
- blk_mq_unquiesce_queue(disk->queue);
+ if (ub->ub_disk)
+ blk_mq_unfreeze_queue(ub->ub_disk->queue, memflags);
+ mutex_unlock(&ub->mutex);
}
-/*
- * Insert PFN ranges of a registered buffer into the maple tree,
- * coalescing consecutive PFNs into single range entries.
- * Returns 0 on success, negative error with partial insertions unwound.
- */
/* Erase coalesced PFN ranges from the maple tree matching buf_index */
static void ublk_buf_erase_ranges(struct ublk_device *ub, int buf_index)
{
@@ -5327,7 +5328,7 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
unsigned long addr, size, nr_pages;
struct page **pages = NULL;
unsigned int gup_flags;
- struct gendisk *disk;
+ unsigned int memflags;
long pinned;
int index;
int ret;
@@ -5354,16 +5355,10 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
!PAGE_ALIGNED(size) || !PAGE_ALIGNED(addr))
return -EINVAL;
- disk = ublk_get_disk(ub);
- if (!disk)
- return -ENODEV;
-
- /* Pin pages before quiescing (may sleep) */
+ /* Pin pages before any locks (may sleep) */
pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL);
- if (!pages) {
- ret = -ENOMEM;
- goto put_disk;
- }
+ if (!pages)
+ return -ENOMEM;
gup_flags = FOLL_LONGTERM;
if (!(buf_reg.flags & UBLK_SHMEM_BUF_READ_ONLY))
@@ -5379,14 +5374,7 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
goto err_unpin;
}
- /*
- * Drain inflight I/O and quiesce the queue so no new requests
- * are dispatched while we modify the maple tree. Keep freeze
- * and mutex non-nested to avoid lock dependency.
- */
- ublk_quiesce_and_release(disk);
-
- mutex_lock(&ub->mutex);
+ memflags = ublk_lock_buf_tree(ub);
index = ida_alloc_max(&ub->buf_ida, USHRT_MAX, GFP_KERNEL);
if (index < 0) {
@@ -5400,22 +5388,16 @@ static int ublk_ctrl_reg_buf(struct ublk_device *ub,
goto err_unlock;
}
- mutex_unlock(&ub->mutex);
-
+ ublk_unlock_buf_tree(ub, memflags);
kvfree(pages);
- ublk_unquiesce_and_resume(disk);
- ublk_put_disk(disk);
return index;
err_unlock:
- mutex_unlock(&ub->mutex);
- ublk_unquiesce_and_resume(disk);
+ ublk_unlock_buf_tree(ub, memflags);
err_unpin:
unpin_user_pages(pages, pinned);
err_free_pages:
kvfree(pages);
-put_disk:
- ublk_put_disk(disk);
return ret;
}
@@ -5459,7 +5441,7 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
struct ublksrv_ctrl_cmd *header)
{
int index = (int)header->data[0];
- struct gendisk *disk;
+ unsigned int memflags;
int ret;
if (!ublk_dev_support_shmem_zc(ub))
@@ -5468,23 +5450,13 @@ static int ublk_ctrl_unreg_buf(struct ublk_device *ub,
if (index < 0 || index > USHRT_MAX)
return -EINVAL;
- disk = ublk_get_disk(ub);
- if (!disk)
- return -ENODEV;
-
- /* Drain inflight I/O before modifying the maple tree */
- ublk_quiesce_and_release(disk);
-
- mutex_lock(&ub->mutex);
+ memflags = ublk_lock_buf_tree(ub);
ret = __ublk_ctrl_unreg_buf(ub, index);
if (!ret)
ida_free(&ub->buf_ida, index);
- mutex_unlock(&ub->mutex);
-
- ublk_unquiesce_and_resume(disk);
- ublk_put_disk(disk);
+ ublk_unlock_buf_tree(ub, memflags);
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (4 preceding siblings ...)
2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
- Use "physical pages" instead of "page frame numbers (PFNs)" for
clarity
- Remove "without any per-I/O overhead" claim from zero-copy
description
- Add scatter/gather limitation: each I/O's data must be contiguous
within a single registered buffer
Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
Documentation/block/ublk.rst | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index a818e09a4b66..c39d111af2d2 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -492,8 +492,8 @@ The ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path
that works by sharing physical memory pages between the client application
and the ublk server. Unlike the io_uring fixed buffer approach above,
shared memory zero copy does not require io_uring buffer registration
-per I/O — instead, it relies on the kernel matching page frame numbers
-(PFNs) at I/O time. This allows the ublk server to access the shared
+per I/O — instead, it relies on the kernel matching physical pages
+at I/O time. This allows the ublk server to access the shared
buffer directly, which is unlikely for the io_uring fixed buffer
approach.
@@ -507,8 +507,7 @@ tells the server where the data already lives.
``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client
applications — when the client is willing to allocate I/O buffers from
-shared memory, the entire data path becomes zero-copy without any per-I/O
-overhead.
+shared memory, the entire data path becomes zero-copy.
Use Cases
~~~~~~~~~
@@ -584,6 +583,9 @@ Limitations
the page cache, which allocates its own pages. These kernel-allocated
pages will never match the registered shared buffer. Only ``O_DIRECT``
puts the client's buffer pages directly into the block I/O.
+- **Contiguous data only**: each I/O request's data must be contiguous
+ within a single registered buffer. Scatter/gather I/O that spans
+ multiple non-adjacent registered buffers cannot use the zero-copy path.
Control Commands
~~~~~~~~~~~~~~~~
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (5 preceding siblings ...)
2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
@ 2026-04-09 13:30 ` Ming Lei
2026-04-10 1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
2026-04-10 1:12 ` Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2026-04-09 13:30 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Caleb Sander Mateos, Ming Lei
Update the ublk userspace block driver maintainer email address
from ming.lei@redhat.com to tom.leiming@gmail.com as the original
email will become invalid.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 77fdfcb55f06..4abb3345bc4e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26992,7 +26992,7 @@ F: Documentation/filesystems/ubifs.rst
F: fs/ubifs/
UBLK USERSPACE BLOCK DRIVER
-M: Ming Lei <ming.lei@redhat.com>
+M: Ming Lei <tom.leiming@gmail.com>
L: linux-block@vger.kernel.org
S: Maintained
F: Documentation/block/ublk.rst
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (6 preceding siblings ...)
2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
@ 2026-04-10 1:11 ` Jens Axboe
2026-04-10 1:12 ` Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-04-10 1:11 UTC (permalink / raw)
To: linux-block, Ming Lei; +Cc: Caleb Sander Mateos
On Thu, 09 Apr 2026 21:30:12 +0800, Ming Lei wrote:
> Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
> addressing review feedback from Caleb Sander Mateos.
>
> - Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
> (the __u32 field overflowed to 0 for exactly 4GB)
> - Verify all pages in multi-page bvecs fall within the registered maple
> tree range, removing base_pfn from ublk_buf_range since mas.index
> provides the range start PFN
> - Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
> - Replace xarray with IDA for buffer index allocation, removing the
> unnecessary struct ublk_buf
> - Allow buffer registration before device is started by taking ub->mutex
> before freezing the queue (same ordering as ublk_stop_dev_unlocked)
> - Address documentation review comments
> - Update MAINTAINERS email
>
> [...]
Applied, thanks!
[1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
commit: 23b3b6f0b584b70a427d5bb826d320151890d7da
[2/7] ublk: verify all pages in multi-page bvec fall within registered range
commit: 211ff1602b67e26125977f8b2f369d7c2847628c
[3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
commit: 8ea8566a9aeef746699d8c84bed3ac44edbfaa0e
[4/7] ublk: replace xarray with IDA for shmem buffer index allocation
commit: 5e864438e2853ef5112d7905fadcc3877e2be70a
[5/7] ublk: allow buffer registration before device is started
commit: 365ea7cc62447caac508706b429cdf031cc15a9f
[6/7] Documentation: ublk: address review comments for SHMEM_ZC docs
commit: 289653bb76c46149f88939c3cfef55cdb236ace2
[7/7] MAINTAINERS: update ublk driver maintainer email
commit: b774765fb804045ee774476ded8e52482ae5ecb7
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/7] ublk: followup fixes for SHMEM_ZC
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
` (7 preceding siblings ...)
2026-04-10 1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
@ 2026-04-10 1:12 ` Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-04-10 1:12 UTC (permalink / raw)
To: Ming Lei, linux-block; +Cc: Caleb Sander Mateos
On 4/9/26 7:30 AM, Ming Lei wrote:
> Hello Jens,
>
> Followup fixes for the SHMEM_ZC (shared memory zero copy) patch series,
> addressing review feedback from Caleb Sander Mateos.
>
> - Widen ublk_shmem_buf_reg.len to __u64 so 4GB buffers can be registered
> (the __u32 field overflowed to 0 for exactly 4GB)
> - Verify all pages in multi-page bvecs fall within the registered maple
> tree range, removing base_pfn from ublk_buf_range since mas.index
> provides the range start PFN
> - Simplify the PFN range coalescing loop in __ublk_ctrl_reg_buf
> - Replace xarray with IDA for buffer index allocation, removing the
> unnecessary struct ublk_buf
> - Allow buffer registration before device is started by taking ub->mutex
> before freezing the queue (same ordering as ublk_stop_dev_unlocked)
> - Address documentation review comments
> - Update MAINTAINERS email
Applied, but I'm unsure what base you used, I'm guessing your old base
rather than my for-7.1/block which already had fixups for the issues I
mentioned? In any case, just ensure that future updates are against the
actual tree, not on top of your previous tree.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-10 1:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 13:30 [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Ming Lei
2026-04-09 13:30 ` [PATCH 1/7] ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support Ming Lei
2026-04-09 13:30 ` [PATCH 2/7] ublk: verify all pages in multi-page bvec fall within registered range Ming Lei
2026-04-09 13:30 ` [PATCH 3/7] ublk: simplify PFN range loop in __ublk_ctrl_reg_buf Ming Lei
2026-04-09 13:30 ` [PATCH 4/7] ublk: replace xarray with IDA for shmem buffer index allocation Ming Lei
2026-04-09 13:30 ` [PATCH 5/7] ublk: allow buffer registration before device is started Ming Lei
2026-04-09 13:30 ` [PATCH 6/7] Documentation: ublk: address review comments for SHMEM_ZC docs Ming Lei
2026-04-09 13:30 ` [PATCH 7/7] MAINTAINERS: update ublk driver maintainer email Ming Lei
2026-04-10 1:11 ` [PATCH 0/7] ublk: followup fixes for SHMEM_ZC Jens Axboe
2026-04-10 1:12 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox