io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] io_uring zcrx ifq sharing
@ 2025-10-25 19:14 David Wei
  2025-10-25 19:15 ` [PATCH v2 1/5] io_uring/rsrc: rename and export io_lock_two_rings() David Wei
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: David Wei @ 2025-10-25 19:14 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Each ifq is bound to a HW RX queue with no way to share this across
multiple rings. It is possible that one ring will not be able to fully
saturate an entire HW RX queue due to userspace work. There are two ways
to handle more work:

  1. Move work to other threads, but have to pay context switch overhead
     and cold caches.
  2. Add more rings with ifqs, but HW RX queues are a limited resource.

This patchset add a way for multiple rings to share the same underlying
real ifq that is bound to a HW RX queue. This is done by having dst
rings create proxy ifqs that point to a real ifq in a src ring. Rings
with proxy ifqs can issue io_recvzc on zero copy sockets, just like the
src ring.

Userspace are expected to create rings in separate threads and not
processes, such that all rings share the same address space. This is
because the sharing and synchronisation of refill rings is purely done
in userspace with no kernel involvement e.g. dst rings do not mmap the
refill ring. Also, userspace must distribute zero copy sockets steered
into the same HW RX queue across the shared ifqs.

David Wei (5):
  io_uring/rsrc: rename and export io_lock_two_rings()
  io_uring/zcrx: add refcount to struct io_zcrx_ifq
  io_uring/zcrx: share an ifq between rings
  io_uring/zcrx: redirect io_recvzc on proxy ifq to src ifq
  io_uring/zcrx: free proxy ifqs

 include/uapi/linux/io_uring.h |  4 ++
 io_uring/net.c                |  2 +
 io_uring/rsrc.c               |  4 +-
 io_uring/rsrc.h               |  1 +
 io_uring/zcrx.c               | 91 +++++++++++++++++++++++++++++++++--
 io_uring/zcrx.h               |  3 ++
 6 files changed, 100 insertions(+), 5 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/5] io_uring/rsrc: rename and export io_lock_two_rings()
  2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
@ 2025-10-25 19:15 ` David Wei
  2025-10-25 19:15 ` [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq David Wei
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: David Wei @ 2025-10-25 19:15 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Rename lock_two_rings() to io_lock_two_rings() and export. This will be
used when registering a src ifq owned by one ring as a proxy in another
ring. During this process both rings need to be locked in a
deterministic order, similar to the current user io_clone_buffers().

Signed-off-by: David Wei <dw@davidwei.uk>
---
 io_uring/rsrc.c | 4 ++--
 io_uring/rsrc.h | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d787c16dc1c3..d245b7592eee 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1148,7 +1148,7 @@ int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 }
 
 /* Lock two rings at once. The rings must be different! */
-static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
+void io_lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
 {
 	if (ctx1 > ctx2)
 		swap(ctx1, ctx2);
@@ -1299,7 +1299,7 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
 	src_ctx = file->private_data;
 	if (src_ctx != ctx) {
 		mutex_unlock(&ctx->uring_lock);
-		lock_two_rings(ctx, src_ctx);
+		io_lock_two_rings(ctx, src_ctx);
 
 		if (src_ctx->submitter_task &&
 		    src_ctx->submitter_task != current) {
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index a3ca6ba66596..b002c4a5a8cd 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -70,6 +70,7 @@ int io_import_reg_vec(int ddir, struct iov_iter *iter,
 int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
 			const struct iovec __user *uvec, size_t uvec_segs);
 
+void io_lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2);
 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
 int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq
  2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
  2025-10-25 19:15 ` [PATCH v2 1/5] io_uring/rsrc: rename and export io_lock_two_rings() David Wei
@ 2025-10-25 19:15 ` David Wei
  2025-10-25 23:37   ` Jens Axboe
  2025-10-25 19:15 ` [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings David Wei
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: David Wei @ 2025-10-25 19:15 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Add a refcount to struct io_zcrx_ifq to track the number of proxy ifqs
that refer to it. The init count is 1 and means there are no proxy ifqs.

This ref is checked in io_shutdown_zcrx_ifqs() to ensure that an ifq is
not cleaned up while there are still proxy ifqs alive.

Opted for a bog standard refcount_t. The increment and decrement
operations are expected to happen during the slow setup/teardown paths
only, and a src ifq is only expected to be shared a handful of times at
most.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 io_uring/zcrx.c | 3 +++
 io_uring/zcrx.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index a816f5902091..22d759307c16 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -587,6 +587,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	if (!ifq)
 		return -ENOMEM;
 	ifq->rq_entries = reg.rq_entries;
+	refcount_set(&ifq->refs, 1);
 
 	scoped_guard(mutex, &ctx->mmap_lock) {
 		/* preallocate id */
@@ -730,6 +731,8 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
 	lockdep_assert_held(&ctx->uring_lock);
 
 	xa_for_each(&ctx->zcrx_ctxs, index, ifq) {
+		if (refcount_read(&ifq->refs) > 1)
+			continue;
 		io_zcrx_scrub(ifq);
 		io_close_queue(ifq);
 	}
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 33ef61503092..566d519cbaf6 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -60,6 +60,8 @@ struct io_zcrx_ifq {
 	 */
 	struct mutex			pp_lock;
 	struct io_mapped_region		region;
+
+	refcount_t			refs;
 };
 
 #if defined(CONFIG_IO_URING_ZCRX)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
  2025-10-25 19:15 ` [PATCH v2 1/5] io_uring/rsrc: rename and export io_lock_two_rings() David Wei
  2025-10-25 19:15 ` [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq David Wei
@ 2025-10-25 19:15 ` David Wei
  2025-10-25 23:41   ` Jens Axboe
  2025-10-25 19:15 ` [PATCH v2 4/5] io_uring/zcrx: redirect io_recvzc on proxy ifq to src ifq David Wei
  2025-10-25 19:15 ` [PATCH v2 5/5] io_uring/zcrx: free proxy ifqs David Wei
  4 siblings, 1 reply; 13+ messages in thread
From: David Wei @ 2025-10-25 19:15 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Add a way to share an ifq from a src ring that is real i.e. bound to a
HW RX queue with other rings. This is done by passing a new flag
IORING_ZCRX_IFQ_REG_PROXY in the registration struct
io_uring_zcrx_ifq_reg, alongside the fd of the src ring and the ifq id
to be proxied.

It is not permitted to proxy another proxy ifq; it must be a real ifq.

The proxy ifq has most fields zero initialised. The exception is
ifq->if_rxq, which is set to the same value as the src ifq. This is
because if_rxq will be used by the cleanup code to denote an ifq that
has been cleaned up but not yet freed.

To prevent the src ring or ifq from being cleaned up or freed while the
proxy exists, take the appropriate refs on the src ring (ctx->refs) and
src ifq (ifq->refs). The subsequent patch that adds cleaning up proxy
ifqs will discuss lifetimes in more detail.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/uapi/linux/io_uring.h |  4 ++
 io_uring/zcrx.c               | 72 ++++++++++++++++++++++++++++++++++-
 io_uring/zcrx.h               |  1 +
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 263bed13473e..10f6347b1725 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -1055,6 +1055,10 @@ struct io_uring_zcrx_area_reg {
 	__u64	__resv2[2];
 };
 
+enum io_uring_zcrx_ifq_reg_flags {
+	IORING_ZCRX_IFQ_REG_PROXY	= 1,
+};
+
 /*
  * Argument for IORING_REGISTER_ZCRX_IFQ
  */
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 22d759307c16..6b9066333fcf 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -22,10 +22,10 @@
 #include <uapi/linux/io_uring.h>
 
 #include "io_uring.h"
-#include "kbuf.h"
 #include "memmap.h"
 #include "zcrx.h"
 #include "rsrc.h"
+#include "register.h"
 
 #define IO_ZCRX_AREA_SUPPORTED_FLAGS	(IORING_ZCRX_AREA_DMABUF)
 
@@ -541,6 +541,74 @@ struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
 	return ifq ? &ifq->region : NULL;
 }
 
+static int io_proxy_zcrx_ifq(struct io_ring_ctx *ctx,
+			     struct io_uring_zcrx_ifq_reg __user *arg,
+			     struct io_uring_zcrx_ifq_reg *reg)
+{
+	struct io_zcrx_ifq *ifq, *src_ifq;
+	struct io_ring_ctx *src_ctx;
+	struct file *file;
+	int src_fd, ret;
+	u32 src_id, id;
+
+	src_fd = reg->if_idx;
+	src_id = reg->if_rxq;
+
+	file = io_uring_register_get_file(src_fd, false);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	src_ctx = file->private_data;
+	if (src_ctx == ctx)
+		return -EBADFD;
+
+	mutex_unlock(&ctx->uring_lock);
+	io_lock_two_rings(ctx, src_ctx);
+
+	ret = -EINVAL;
+	src_ifq = xa_load(&src_ctx->zcrx_ctxs, src_id);
+	if (!src_ifq || src_ifq->proxy)
+		goto err_unlock;
+
+	percpu_ref_get(&src_ctx->refs);
+	refcount_inc(&src_ifq->refs);
+
+	ifq = kzalloc(sizeof(*ifq), GFP_KERNEL);
+	ifq->proxy = src_ifq;
+	ifq->ctx = ctx;
+	ifq->if_rxq = src_ifq->if_rxq;
+
+	scoped_guard(mutex, &ctx->mmap_lock) {
+		ret = xa_alloc(&ctx->zcrx_ctxs, &id, NULL, xa_limit_31b, GFP_KERNEL);
+		if (ret)
+			goto err_free;
+
+		ret = -ENOMEM;
+		if (xa_store(&ctx->zcrx_ctxs, id, ifq, GFP_KERNEL)) {
+			xa_erase(&ctx->zcrx_ctxs, id);
+			goto err_free;
+		}
+	}
+
+	reg->zcrx_id = id;
+	if (copy_to_user(arg, reg, sizeof(*reg))) {
+		ret = -EFAULT;
+		goto err;
+	}
+	mutex_unlock(&src_ctx->uring_lock);
+	fput(file);
+	return 0;
+err:
+	scoped_guard(mutex, &ctx->mmap_lock)
+		xa_erase(&ctx->zcrx_ctxs, id);
+err_free:
+	kfree(ifq);
+err_unlock:
+	mutex_unlock(&src_ctx->uring_lock);
+	fput(file);
+	return ret;
+}
+
 int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 			  struct io_uring_zcrx_ifq_reg __user *arg)
 {
@@ -566,6 +634,8 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 		return -EINVAL;
 	if (copy_from_user(&reg, arg, sizeof(reg)))
 		return -EFAULT;
+	if (reg.flags & IORING_ZCRX_IFQ_REG_PROXY)
+		return io_proxy_zcrx_ifq(ctx, arg, &reg);
 	if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd)))
 		return -EFAULT;
 	if (!mem_is_zero(&reg.__resv, sizeof(reg.__resv)) ||
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 566d519cbaf6..0df956cb9592 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -62,6 +62,7 @@ struct io_zcrx_ifq {
 	struct io_mapped_region		region;
 
 	refcount_t			refs;
+	struct io_zcrx_ifq		*proxy;
 };
 
 #if defined(CONFIG_IO_URING_ZCRX)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/5] io_uring/zcrx: redirect io_recvzc on proxy ifq to src ifq
  2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
                   ` (2 preceding siblings ...)
  2025-10-25 19:15 ` [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings David Wei
@ 2025-10-25 19:15 ` David Wei
  2025-10-25 19:15 ` [PATCH v2 5/5] io_uring/zcrx: free proxy ifqs David Wei
  4 siblings, 0 replies; 13+ messages in thread
From: David Wei @ 2025-10-25 19:15 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Technically there is no reason why one ring can't issue io_recvzc on a
socket that is steered into a zero copy HW RX queue bound to an ifq in
another ring. No ifq locks are taken in the happy zero copy path; only
socket locks. If copy fallback is needed the freelist spinlock is taken,
which ensures multiple contexts can synchronise access.

Writing to the tail of the refill ring needs to be synchronised, though
that can be done purely from userspace.

The only thing preventing this today is a check in io_zcrx_recv_frag()
that returns EFAULT if the ifq of the net_iov in an skb doesn't match.
This is the ifq that owns the memory provider bound to a HW RX queue.

The previous patches added a proxy ifq that has a ptr to the src ifq.
Therefore to pass this check, use the src ifq in io_recvzc.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 io_uring/net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/net.c b/io_uring/net.c
index a95cc9ca2a4d..8eb6145e0f4d 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1250,6 +1250,8 @@ int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	zc->ifq = xa_load(&req->ctx->zcrx_ctxs, ifq_idx);
 	if (!zc->ifq)
 		return -EINVAL;
+	if (zc->ifq->proxy)
+		zc->ifq = zc->ifq->proxy;
 
 	zc->len = READ_ONCE(sqe->len);
 	zc->flags = READ_ONCE(sqe->ioprio);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/5] io_uring/zcrx: free proxy ifqs
  2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
                   ` (3 preceding siblings ...)
  2025-10-25 19:15 ` [PATCH v2 4/5] io_uring/zcrx: redirect io_recvzc on proxy ifq to src ifq David Wei
@ 2025-10-25 19:15 ` David Wei
  4 siblings, 0 replies; 13+ messages in thread
From: David Wei @ 2025-10-25 19:15 UTC (permalink / raw)
  To: io-uring, netdev; +Cc: Jens Axboe, Pavel Begunkov

Freeing an ifq is a two step process: in io_ring_exit_work(), the ifqs
are first cleaned up via io_shutdown_zcrx_ifqs() while there are still
outstanding ctx->refs. Then once ctx->refs falls to 0, the ifqs are
freed in io_unregister_zcrx_ifqs().

The main thing to note is that io_shutdown_zcrx_ifqs() may be called
multiple times. To ensure each ifq is only cleaned up once, set
ifq->if_rxq to -1 once cleanup is done.

Proxy ifqs hold two refs: one on the src ifq and one on the src ring. In
io_shutdown_zcrx_ifqs(), dec both refs. While these refs are held, the
src ring is looping in io_ring_exit_work(). The active refs on ifq->refs
prevents the src ifqs from being cleaned up, and the active refs on
ctx->refs prevents the objects from being freed. Once all refs are gone,
the src ring proceeds with io_ring_ctx_free().

Signed-off-by: David Wei <dw@davidwei.uk>
---
 io_uring/zcrx.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 6b9066333fcf..2d0d1ca016c5 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -519,6 +519,8 @@ static void io_close_queue(struct io_zcrx_ifq *ifq)
 
 static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
 {
+	if (ifq->proxy)
+		goto free;
 	io_close_queue(ifq);
 
 	if (ifq->area)
@@ -528,6 +530,7 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
 
 	io_free_rbuf_ring(ifq);
 	mutex_destroy(&ifq->pp_lock);
+free:
 	kfree(ifq);
 }
 
@@ -801,10 +804,19 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
 	lockdep_assert_held(&ctx->uring_lock);
 
 	xa_for_each(&ctx->zcrx_ctxs, index, ifq) {
-		if (refcount_read(&ifq->refs) > 1)
+		if (ifq->if_rxq == -1)
 			continue;
-		io_zcrx_scrub(ifq);
-		io_close_queue(ifq);
+
+		if (!ifq->proxy) {
+			if (refcount_read(&ifq->refs) > 1)
+				continue;
+			io_zcrx_scrub(ifq);
+			io_close_queue(ifq);
+		} else {
+			refcount_dec(&ifq->proxy->refs);
+			percpu_ref_put(&ifq->proxy->ctx->refs);
+			ifq->if_rxq = -1;
+		}
 	}
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq
  2025-10-25 19:15 ` [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq David Wei
@ 2025-10-25 23:37   ` Jens Axboe
  2025-10-26  4:10     ` David Wei
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-25 23:37 UTC (permalink / raw)
  To: David Wei, io-uring, netdev; +Cc: Pavel Begunkov

On 10/25/25 1:15 PM, David Wei wrote:
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index a816f5902091..22d759307c16 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -730,6 +731,8 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
>  	lockdep_assert_held(&ctx->uring_lock);
>  
>  	xa_for_each(&ctx->zcrx_ctxs, index, ifq) {
> +		if (refcount_read(&ifq->refs) > 1)
> +			continue;

This is a bit odd, it's not an idiomatic way to use reference counts.
Why isn't this a refcount_dec_and_test()? Given that both the later grab
when sharing is enabled and the shutdown here are under the ->uring_lock
this may not matter, but it'd be a lot more obviously correct if it
looked ala:

		if (refcount_dec_and_test(&ifq->refs)) {
  			io_zcrx_scrub(ifq);
  			io_close_queue(ifq);
		}

instead?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-25 19:15 ` [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings David Wei
@ 2025-10-25 23:41   ` Jens Axboe
  2025-10-26  4:12     ` David Wei
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-25 23:41 UTC (permalink / raw)
  To: David Wei, io-uring, netdev; +Cc: Pavel Begunkov

On 10/25/25 1:15 PM, David Wei wrote:
> @@ -541,6 +541,74 @@ struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
>  	return ifq ? &ifq->region : NULL;
>  }
>  
> +static int io_proxy_zcrx_ifq(struct io_ring_ctx *ctx,
> +			     struct io_uring_zcrx_ifq_reg __user *arg,
> +			     struct io_uring_zcrx_ifq_reg *reg)
> +{
> +	struct io_zcrx_ifq *ifq, *src_ifq;
> +	struct io_ring_ctx *src_ctx;
> +	struct file *file;
> +	int src_fd, ret;
> +	u32 src_id, id;
> +
> +	src_fd = reg->if_idx;
> +	src_id = reg->if_rxq;
> +
> +	file = io_uring_register_get_file(src_fd, false);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	src_ctx = file->private_data;
> +	if (src_ctx == ctx)
> +		return -EBADFD;
> +
> +	mutex_unlock(&ctx->uring_lock);
> +	io_lock_two_rings(ctx, src_ctx);
> +
> +	ret = -EINVAL;
> +	src_ifq = xa_load(&src_ctx->zcrx_ctxs, src_id);
> +	if (!src_ifq || src_ifq->proxy)
> +		goto err_unlock;
> +
> +	percpu_ref_get(&src_ctx->refs);
> +	refcount_inc(&src_ifq->refs);
> +
> +	ifq = kzalloc(sizeof(*ifq), GFP_KERNEL);

This still needs a:

	if (!ifq)
		handle error

addition, like mentioned for v1. Would probably make sense to just
assume that everything is honky dory and allocate it upfront/early, and
just kill it in the error path. Would probably help remove one of the
goto labels.

> +	ifq->proxy = src_ifq;

For this, since the ifq is shared and reference counted, why don't they
just point at the same memory here? Would avoid having this ->proxy
thing and just skipping to that in other spots where the actual
io_zcrx_ifq is required?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq
  2025-10-25 23:37   ` Jens Axboe
@ 2025-10-26  4:10     ` David Wei
  0 siblings, 0 replies; 13+ messages in thread
From: David Wei @ 2025-10-26  4:10 UTC (permalink / raw)
  To: Jens Axboe, io-uring, netdev; +Cc: Pavel Begunkov

On 2025-10-25 16:37, Jens Axboe wrote:
> On 10/25/25 1:15 PM, David Wei wrote:
>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>> index a816f5902091..22d759307c16 100644
>> --- a/io_uring/zcrx.c
>> +++ b/io_uring/zcrx.c
>> @@ -730,6 +731,8 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
>>   	lockdep_assert_held(&ctx->uring_lock);
>>   
>>   	xa_for_each(&ctx->zcrx_ctxs, index, ifq) {
>> +		if (refcount_read(&ifq->refs) > 1)
>> +			continue;
> 
> This is a bit odd, it's not an idiomatic way to use reference counts.
> Why isn't this a refcount_dec_and_test()? Given that both the later grab
> when sharing is enabled and the shutdown here are under the ->uring_lock
> this may not matter, but it'd be a lot more obviously correct if it
> looked ala:
> 
> 		if (refcount_dec_and_test(&ifq->refs)) {
>    			io_zcrx_scrub(ifq);
>    			io_close_queue(ifq);
> 		}
> 
> instead?
> 

Yeah, good idea. Your comments prompted me to try to find a better
solution that gets rid of ifq->proxy. Turns out xarray has 3 bits per
entry that can be 'marked'. With this I can get a cleaner solution. Will
respin tomorrow.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-25 23:41   ` Jens Axboe
@ 2025-10-26  4:12     ` David Wei
  2025-10-26 13:16       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: David Wei @ 2025-10-26  4:12 UTC (permalink / raw)
  To: Jens Axboe, io-uring, netdev; +Cc: Pavel Begunkov

On 2025-10-25 16:41, Jens Axboe wrote:
> On 10/25/25 1:15 PM, David Wei wrote:
>> @@ -541,6 +541,74 @@ struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
>>   	return ifq ? &ifq->region : NULL;
>>   }
>>   
>> +static int io_proxy_zcrx_ifq(struct io_ring_ctx *ctx,
>> +			     struct io_uring_zcrx_ifq_reg __user *arg,
>> +			     struct io_uring_zcrx_ifq_reg *reg)
>> +{
>> +	struct io_zcrx_ifq *ifq, *src_ifq;
>> +	struct io_ring_ctx *src_ctx;
>> +	struct file *file;
>> +	int src_fd, ret;
>> +	u32 src_id, id;
>> +
>> +	src_fd = reg->if_idx;
>> +	src_id = reg->if_rxq;
>> +
>> +	file = io_uring_register_get_file(src_fd, false);
>> +	if (IS_ERR(file))
>> +		return PTR_ERR(file);
>> +
>> +	src_ctx = file->private_data;
>> +	if (src_ctx == ctx)
>> +		return -EBADFD;
>> +
>> +	mutex_unlock(&ctx->uring_lock);
>> +	io_lock_two_rings(ctx, src_ctx);
>> +
>> +	ret = -EINVAL;
>> +	src_ifq = xa_load(&src_ctx->zcrx_ctxs, src_id);
>> +	if (!src_ifq || src_ifq->proxy)
>> +		goto err_unlock;
>> +
>> +	percpu_ref_get(&src_ctx->refs);
>> +	refcount_inc(&src_ifq->refs);
>> +
>> +	ifq = kzalloc(sizeof(*ifq), GFP_KERNEL);
> 
> This still needs a:
> 
> 	if (!ifq)
> 		handle error
> 
> addition, like mentioned for v1. Would probably make sense to just
> assume that everything is honky dory and allocate it upfront/early, and
> just kill it in the error path. Would probably help remove one of the
> goto labels.

Sorry I missed this during the splitting. Will include in v3.

> 
>> +	ifq->proxy = src_ifq;
> 
> For this, since the ifq is shared and reference counted, why don't they
> just point at the same memory here? Would avoid having this ->proxy
> thing and just skipping to that in other spots where the actual
> io_zcrx_ifq is required?
> 

I wanted a way to separate src and dst rings, while also decrementing
refcounts once and only once. I used separate ifq objects to do this,
but having learnt about xarray marks, I think I can use that instead.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-26  4:12     ` David Wei
@ 2025-10-26 13:16       ` Jens Axboe
  2025-10-26 13:43         ` Jens Axboe
  2025-10-26 15:06         ` David Wei
  0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2025-10-26 13:16 UTC (permalink / raw)
  To: David Wei, io-uring, netdev; +Cc: Pavel Begunkov

On 10/25/25 10:12 PM, David Wei wrote:
> Sorry I missed this during the splitting. Will include in v3.
> 
>>
>>> +    ifq->proxy = src_ifq;
>>
>> For this, since the ifq is shared and reference counted, why don't they
>> just point at the same memory here? Would avoid having this ->proxy
>> thing and just skipping to that in other spots where the actual
>> io_zcrx_ifq is required?
>>
> 
> I wanted a way to separate src and dst rings, while also decrementing
> refcounts once and only once. I used separate ifq objects to do this,
> but having learnt about xarray marks, I think I can use that instead.

I'm confused, why do you even need that? You already have
ifq->proxy which is just a "link" to the shared queue, why aren't both
rings just using the same ifq structure? You already increment the
refcount when you add proxy, why can't the new ring just store the same
ifq?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-26 13:16       ` Jens Axboe
@ 2025-10-26 13:43         ` Jens Axboe
  2025-10-26 15:06         ` David Wei
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2025-10-26 13:43 UTC (permalink / raw)
  To: David Wei, io-uring, netdev; +Cc: Pavel Begunkov

On 10/26/25 7:16 AM, Jens Axboe wrote:
> On 10/25/25 10:12 PM, David Wei wrote:
>> Sorry I missed this during the splitting. Will include in v3.
>>
>>>
>>>> +    ifq->proxy = src_ifq;
>>>
>>> For this, since the ifq is shared and reference counted, why don't they
>>> just point at the same memory here? Would avoid having this ->proxy
>>> thing and just skipping to that in other spots where the actual
>>> io_zcrx_ifq is required?
>>>
>>
>> I wanted a way to separate src and dst rings, while also decrementing
>> refcounts once and only once. I used separate ifq objects to do this,
>> but having learnt about xarray marks, I think I can use that instead.
> 
> I'm confused, why do you even need that? You already have
> ifq->proxy which is just a "link" to the shared queue, why aren't both
> rings just using the same ifq structure? You already increment the
> refcount when you add proxy, why can't the new ring just store the same
> ifq?

And just to follow up - if this isn't directly feasible, then I think
the ifq bits need to be refactored a bit first to facilitate having it
be shared. Having a dummy ifq that only uses ->proxy is not super clean
to look at. From a quick look, ->ctx is the odd one out there, that part
is obviously not shared between the two rings.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings
  2025-10-26 13:16       ` Jens Axboe
  2025-10-26 13:43         ` Jens Axboe
@ 2025-10-26 15:06         ` David Wei
  1 sibling, 0 replies; 13+ messages in thread
From: David Wei @ 2025-10-26 15:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring, netdev; +Cc: Pavel Begunkov

On 2025-10-26 06:16, Jens Axboe wrote:
> On 10/25/25 10:12 PM, David Wei wrote:
>> Sorry I missed this during the splitting. Will include in v3.
>>
>>>
>>>> +    ifq->proxy = src_ifq;
>>>
>>> For this, since the ifq is shared and reference counted, why don't they
>>> just point at the same memory here? Would avoid having this ->proxy
>>> thing and just skipping to that in other spots where the actual
>>> io_zcrx_ifq is required?
>>>
>>
>> I wanted a way to separate src and dst rings, while also decrementing
>> refcounts once and only once. I used separate ifq objects to do this,
>> but having learnt about xarray marks, I think I can use that instead.
> 
> I'm confused, why do you even need that? You already have
> ifq->proxy which is just a "link" to the shared queue, why aren't both
> rings just using the same ifq structure? You already increment the
> refcount when you add proxy, why can't the new ring just store the same
> ifq?
> 

The main reason is I want only-once semantics for decrementing the
refcounts. I used a separate ifq to do this, marking it with -1
afterwards, but I learnt about xarray marks which lets me do the same
thing.

Hopefully v3 will make it clearer what I mean. It's looking really
clean.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-26 15:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-25 19:14 [PATCH v2 0/5] io_uring zcrx ifq sharing David Wei
2025-10-25 19:15 ` [PATCH v2 1/5] io_uring/rsrc: rename and export io_lock_two_rings() David Wei
2025-10-25 19:15 ` [PATCH v2 2/5] io_uring/zcrx: add refcount to struct io_zcrx_ifq David Wei
2025-10-25 23:37   ` Jens Axboe
2025-10-26  4:10     ` David Wei
2025-10-25 19:15 ` [PATCH v2 3/5] io_uring/zcrx: share an ifq between rings David Wei
2025-10-25 23:41   ` Jens Axboe
2025-10-26  4:12     ` David Wei
2025-10-26 13:16       ` Jens Axboe
2025-10-26 13:43         ` Jens Axboe
2025-10-26 15:06         ` David Wei
2025-10-25 19:15 ` [PATCH v2 4/5] io_uring/zcrx: redirect io_recvzc on proxy ifq to src ifq David Wei
2025-10-25 19:15 ` [PATCH v2 5/5] io_uring/zcrx: free proxy ifqs David Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).