* [zcrx-next 0/2] add support for synchronous refill @ 2025-08-17 22:44 Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw) To: io-uring; +Cc: asml.silence Returning buffers via a ring is efficient but can cause problems when the ring doesn't have space. Add a way to return buffers synchronously via io_uring "register" syscall, which should serve as a slow fallback path. For a full branch with all relevant dependencies see https://github.com/isilence/linux.git zcrx/for-next Pavel Begunkov (2): io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: allow synchronous buffer return include/uapi/linux/io_uring.h | 10 ++++ io_uring/register.c | 3 + io_uring/zcrx.c | 100 +++++++++++++++++++++++++++++----- io_uring/zcrx.h | 7 +++ 4 files changed, 107 insertions(+), 13 deletions(-) -- 2.49.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() 2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov @ 2025-08-17 22:44 ` Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov 2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe 2 siblings, 0 replies; 8+ messages in thread From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw) To: io-uring; +Cc: asml.silence Add a helper for verifying a rqe and extracting a niov out of it. It'll be reused in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/zcrx.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index b3cfe0c04920..d510ebc3d382 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -727,6 +727,28 @@ static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq, return &ifq->rqes[idx]; } +static inline bool io_parse_rqe(struct io_uring_zcrx_rqe *rqe, + struct io_zcrx_ifq *ifq, + struct net_iov **ret_niov) +{ + unsigned niov_idx, area_idx; + struct io_zcrx_area *area; + + area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT; + niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift; + + if (unlikely(rqe->__pad || area_idx)) + return false; + area = ifq->area; + + if (unlikely(niov_idx >= area->nia.num_niovs)) + return false; + niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs); + + *ret_niov = &area->nia.niovs[niov_idx]; + return true; +} + static void io_zcrx_ring_refill(struct page_pool *pp, struct io_zcrx_ifq *ifq) { @@ -741,23 +763,11 @@ static void io_zcrx_ring_refill(struct page_pool *pp, do { struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask); - struct io_zcrx_area *area; struct net_iov *niov; - unsigned niov_idx, area_idx; netmem_ref netmem; - area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT; - niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift; - - if (unlikely(rqe->__pad || area_idx)) + if (!io_parse_rqe(rqe, ifq, &niov)) continue; - area = ifq->area; - - if (unlikely(niov_idx >= area->nia.num_niovs)) - continue; - niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs); - - niov = &area->nia.niovs[niov_idx]; if (!io_zcrx_put_niov_uref(niov)) continue; -- 2.49.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return 2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov @ 2025-08-17 22:44 ` Pavel Begunkov 2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe 2 siblings, 0 replies; 8+ messages in thread From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw) To: io-uring; +Cc: asml.silence Returning buffers via a ring is performant and convenient, but it becomes a problem when/if the user misconfigured the ring size and it becomes full. Add a synchronous way to return buffers back to the page pool via a new register opcode. It's supposed to be a reliable slow path for refilling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- include/uapi/linux/io_uring.h | 10 ++++++ io_uring/register.c | 3 ++ io_uring/zcrx.c | 64 +++++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 7 ++++ 4 files changed, 84 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 6957dc539d83..97b206df4cc1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -665,6 +665,9 @@ enum io_uring_register_op { IORING_REGISTER_MEM_REGION = 34, + /* return zcrx buffers back into circulation */ + IORING_REGISTER_ZCRX_REFILL = 35, + /* this goes last */ IORING_REGISTER_LAST, @@ -1046,6 +1049,13 @@ struct io_uring_zcrx_ifq_reg { __u64 __resv[3]; }; +struct io_uring_zcrx_refill { + __u32 zcrx_id; + __u32 nr_entries; + __u64 rqes; + __u64 __resv[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/register.c b/io_uring/register.c index a59589249fce..5155ea627f65 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -835,6 +835,9 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_mem_region(ctx, arg); break; + case IORING_REGISTER_ZCRX_REFILL: + ret = io_zcrx_return_bufs(ctx, arg, nr_args); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index d510ebc3d382..4540e5cd7430 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -922,6 +922,70 @@ static const struct memory_provider_ops io_uring_pp_zc_ops = { .uninstall = io_pp_uninstall, }; +#define IO_ZCRX_MAX_SYS_REFILL_BUFS (1 << 16) +#define IO_ZCRX_SYS_REFILL_BATCH 32 + +static void io_return_buffers(struct io_zcrx_ifq *ifq, + struct io_uring_zcrx_rqe *rqes, unsigned nr) +{ + int i; + + for (i = 0; i < nr; i++) { + struct net_iov *niov; + netmem_ref netmem; + + if (!io_parse_rqe(&rqes[i], ifq, &niov)) + continue; + + scoped_guard(spinlock_bh, &ifq->rq_lock) { + if (!io_zcrx_put_niov_uref(niov)) + continue; + } + + netmem = net_iov_to_netmem(niov); + if (!page_pool_unref_and_test(netmem)) + continue; + io_zcrx_return_niov(niov); + } +} + +int io_zcrx_return_bufs(struct io_ring_ctx *ctx, + void __user *arg, unsigned nr_arg) +{ + struct io_uring_zcrx_rqe rqes[IO_ZCRX_SYS_REFILL_BATCH]; + struct io_uring_zcrx_rqe __user *urqes; + struct io_uring_zcrx_refill zr; + struct io_zcrx_ifq *ifq; + unsigned nr, i; + + if (nr_arg) + return -EINVAL; + if (copy_from_user(&zr, arg, sizeof(zr))) + return -EFAULT; + if (!zr.nr_entries || zr.nr_entries > IO_ZCRX_MAX_SYS_REFILL_BUFS) + return -EINVAL; + if (!mem_is_zero(&zr.__resv, sizeof(zr.__resv))) + return -EINVAL; + + ifq = xa_load(&ctx->zcrx_ctxs, zr.zcrx_id); + if (!ifq) + return -EINVAL; + nr = zr.nr_entries; + urqes = u64_to_user_ptr(zr.rqes); + + for (i = 0; i < nr;) { + unsigned batch = min(nr - i, IO_ZCRX_SYS_REFILL_BATCH); + + if (copy_from_user(rqes, urqes + i, sizeof(rqes))) + return i ? i : -EFAULT; + io_return_buffers(ifq, rqes, batch); + + i += batch; + cond_resched(); + } + return nr; +} + static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, struct io_zcrx_ifq *ifq, int off, int len) { diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index a48871b5adad..33ef61503092 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -63,6 +63,8 @@ struct io_zcrx_ifq { }; #if defined(CONFIG_IO_URING_ZCRX) +int io_zcrx_return_bufs(struct io_ring_ctx *ctx, + void __user *arg, unsigned nr_arg); int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg); void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); @@ -95,6 +97,11 @@ static inline struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ct { return NULL; } +static inline int io_zcrx_return_bufs(struct io_ring_ctx *ctx, + void __user *arg, unsigned nr_arg) +{ + return -EOPNOTSUPP; +} #endif int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); -- 2.49.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill 2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov @ 2025-08-20 18:20 ` Jens Axboe 2025-08-20 18:57 ` Pavel Begunkov 2 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2025-08-20 18:20 UTC (permalink / raw) To: io-uring, Pavel Begunkov On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote: > Returning buffers via a ring is efficient but can cause problems > when the ring doesn't have space. Add a way to return buffers > synchronously via io_uring "register" syscall, which should serve > as a slow fallback path. > > For a full branch with all relevant dependencies see > https://github.com/isilence/linux.git zcrx/for-next > > [...] Applied, thanks! [1/2] io_uring/zcrx: introduce io_parse_rqe() commit: 55e5f6bb0ddfbb5912fec373ed99f9e02b19e3ea [2/2] io_uring/zcrx: allow synchronous buffer return commit: ebbedae5b04f4f4a4d79d2d7329baab74b5a0564 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill 2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe @ 2025-08-20 18:57 ` Pavel Begunkov 2025-08-20 19:02 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Pavel Begunkov @ 2025-08-20 18:57 UTC (permalink / raw) To: Jens Axboe, io-uring On 8/20/25 19:20, Jens Axboe wrote: > > On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote: >> Returning buffers via a ring is efficient but can cause problems >> when the ring doesn't have space. Add a way to return buffers >> synchronously via io_uring "register" syscall, which should serve >> as a slow fallback path. >> >> For a full branch with all relevant dependencies see >> https://github.com/isilence/linux.git zcrx/for-next >> >> [...] > > Applied, thanks! Leave these and other series out please. I sent them for review, but it'll be easier to keep in a branch and rebase it if necessary. I hoped tagging with zcrx-next would be good enough of a sign. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill 2025-08-20 18:57 ` Pavel Begunkov @ 2025-08-20 19:02 ` Jens Axboe 2025-08-20 19:33 ` Pavel Begunkov 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2025-08-20 19:02 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 8/20/25 12:57 PM, Pavel Begunkov wrote: > On 8/20/25 19:20, Jens Axboe wrote: >> >> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote: >>> Returning buffers via a ring is efficient but can cause problems >>> when the ring doesn't have space. Add a way to return buffers >>> synchronously via io_uring "register" syscall, which should serve >>> as a slow fallback path. >>> >>> For a full branch with all relevant dependencies see >>> https://github.com/isilence/linux.git zcrx/for-next >>> >>> [...] >> >> Applied, thanks! > > Leave these and other series out please. I sent them for > review, but it'll be easier to keep in a branch and rebase > it if necessary. I hoped tagging with zcrx-next would be > good enough of a sign. Alright, but then please make it clear when you are sending something out for merging. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill 2025-08-20 19:02 ` Jens Axboe @ 2025-08-20 19:33 ` Pavel Begunkov 2025-08-20 19:36 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Pavel Begunkov @ 2025-08-20 19:33 UTC (permalink / raw) To: Jens Axboe, io-uring On 8/20/25 20:02, Jens Axboe wrote: > On 8/20/25 12:57 PM, Pavel Begunkov wrote: >> On 8/20/25 19:20, Jens Axboe wrote: >>> >>> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote: >>>> Returning buffers via a ring is efficient but can cause problems >>>> when the ring doesn't have space. Add a way to return buffers >>>> synchronously via io_uring "register" syscall, which should serve >>>> as a slow fallback path. >>>> >>>> For a full branch with all relevant dependencies see >>>> https://github.com/isilence/linux.git zcrx/for-next >>>> >>>> [...] >>> >>> Applied, thanks! >> >> Leave these and other series out please. I sent them for >> review, but it'll be easier to keep in a branch and rebase >> it if necessary. I hoped tagging with zcrx-next would be >> good enough of a sign. > > Alright, but then please make it clear when you are sending > something out for merging. That would be a pull request, should be obvious enough. Perhaps it'd make sense to tag for review only patches. Any preference what tag it should be or otherwise? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill 2025-08-20 19:33 ` Pavel Begunkov @ 2025-08-20 19:36 ` Jens Axboe 0 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2025-08-20 19:36 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 8/20/25 1:33 PM, Pavel Begunkov wrote: > On 8/20/25 20:02, Jens Axboe wrote: >> On 8/20/25 12:57 PM, Pavel Begunkov wrote: >>> On 8/20/25 19:20, Jens Axboe wrote: >>>> >>>> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote: >>>>> Returning buffers via a ring is efficient but can cause problems >>>>> when the ring doesn't have space. Add a way to return buffers >>>>> synchronously via io_uring "register" syscall, which should serve >>>>> as a slow fallback path. >>>>> >>>>> For a full branch with all relevant dependencies see >>>>> https://github.com/isilence/linux.git zcrx/for-next >>>>> >>>>> [...] >>>> >>>> Applied, thanks! >>> >>> Leave these and other series out please. I sent them for >>> review, but it'll be easier to keep in a branch and rebase >>> it if necessary. I hoped tagging with zcrx-next would be >>> good enough of a sign. >> >> Alright, but then please make it clear when you are sending >> something out for merging. > > That would be a pull request, should be obvious enough. Perhaps > it'd make sense to tag for review only patches. Any preference > what tag it should be or otherwise? Please just send patches rather than a pull request. PRs get messed up by people all the time, and there's no picking and choosing there, it's all or nothing. A review tag would be good in the subject, anything easily recognizable should do it. Like for-review or whatever, doesn't matter to me. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-20 19:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov 2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov 2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe 2025-08-20 18:57 ` Pavel Begunkov 2025-08-20 19:02 ` Jens Axboe 2025-08-20 19:33 ` Pavel Begunkov 2025-08-20 19:36 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).