* [zcrx-next 0/2] add support for synchronous refill
@ 2025-08-17 22:44 Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Returning buffers via a ring is efficient but can cause problems
when the ring doesn't have space. Add a way to return buffers
synchronously via io_uring "register" syscall, which should serve
as a slow fallback path.
For a full branch with all relevant dependencies see
https://github.com/isilence/linux.git zcrx/for-next
Pavel Begunkov (2):
io_uring/zcrx: introduce io_parse_rqe()
io_uring/zcrx: allow synchronous buffer return
include/uapi/linux/io_uring.h | 10 ++++
io_uring/register.c | 3 +
io_uring/zcrx.c | 100 +++++++++++++++++++++++++++++-----
io_uring/zcrx.h | 7 +++
4 files changed, 107 insertions(+), 13 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe()
2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov
@ 2025-08-17 22:44 ` Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov
2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add a helper for verifying a rqe and extracting a niov out of it. It'll
be reused in following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index b3cfe0c04920..d510ebc3d382 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -727,6 +727,28 @@ static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq,
return &ifq->rqes[idx];
}
+static inline bool io_parse_rqe(struct io_uring_zcrx_rqe *rqe,
+ struct io_zcrx_ifq *ifq,
+ struct net_iov **ret_niov)
+{
+ unsigned niov_idx, area_idx;
+ struct io_zcrx_area *area;
+
+ area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
+ niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
+
+ if (unlikely(rqe->__pad || area_idx))
+ return false;
+ area = ifq->area;
+
+ if (unlikely(niov_idx >= area->nia.num_niovs))
+ return false;
+ niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs);
+
+ *ret_niov = &area->nia.niovs[niov_idx];
+ return true;
+}
+
static void io_zcrx_ring_refill(struct page_pool *pp,
struct io_zcrx_ifq *ifq)
{
@@ -741,23 +763,11 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
do {
struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask);
- struct io_zcrx_area *area;
struct net_iov *niov;
- unsigned niov_idx, area_idx;
netmem_ref netmem;
- area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
- niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
-
- if (unlikely(rqe->__pad || area_idx))
+ if (!io_parse_rqe(rqe, ifq, &niov))
continue;
- area = ifq->area;
-
- if (unlikely(niov_idx >= area->nia.num_niovs))
- continue;
- niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs);
-
- niov = &area->nia.niovs[niov_idx];
if (!io_zcrx_put_niov_uref(niov))
continue;
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return
2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
@ 2025-08-17 22:44 ` Pavel Begunkov
2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2025-08-17 22:44 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Returning buffers via a ring is performant and convenient, but it
becomes a problem when/if the user misconfigured the ring size and it
becomes full. Add a synchronous way to return buffers back to the page
pool via a new register opcode. It's supposed to be a reliable slow
path for refilling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 10 ++++++
io_uring/register.c | 3 ++
io_uring/zcrx.c | 64 +++++++++++++++++++++++++++++++++++
io_uring/zcrx.h | 7 ++++
4 files changed, 84 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 6957dc539d83..97b206df4cc1 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -665,6 +665,9 @@ enum io_uring_register_op {
IORING_REGISTER_MEM_REGION = 34,
+ /* return zcrx buffers back into circulation */
+ IORING_REGISTER_ZCRX_REFILL = 35,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -1046,6 +1049,13 @@ struct io_uring_zcrx_ifq_reg {
__u64 __resv[3];
};
+struct io_uring_zcrx_refill {
+ __u32 zcrx_id;
+ __u32 nr_entries;
+ __u64 rqes;
+ __u64 __resv[2];
+};
+
#ifdef __cplusplus
}
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index a59589249fce..5155ea627f65 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -835,6 +835,9 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
ret = io_register_mem_region(ctx, arg);
break;
+ case IORING_REGISTER_ZCRX_REFILL:
+ ret = io_zcrx_return_bufs(ctx, arg, nr_args);
+ break;
default:
ret = -EINVAL;
break;
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index d510ebc3d382..4540e5cd7430 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -922,6 +922,70 @@ static const struct memory_provider_ops io_uring_pp_zc_ops = {
.uninstall = io_pp_uninstall,
};
+#define IO_ZCRX_MAX_SYS_REFILL_BUFS (1 << 16)
+#define IO_ZCRX_SYS_REFILL_BATCH 32
+
+static void io_return_buffers(struct io_zcrx_ifq *ifq,
+ struct io_uring_zcrx_rqe *rqes, unsigned nr)
+{
+ int i;
+
+ for (i = 0; i < nr; i++) {
+ struct net_iov *niov;
+ netmem_ref netmem;
+
+ if (!io_parse_rqe(&rqes[i], ifq, &niov))
+ continue;
+
+ scoped_guard(spinlock_bh, &ifq->rq_lock) {
+ if (!io_zcrx_put_niov_uref(niov))
+ continue;
+ }
+
+ netmem = net_iov_to_netmem(niov);
+ if (!page_pool_unref_and_test(netmem))
+ continue;
+ io_zcrx_return_niov(niov);
+ }
+}
+
+int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg)
+{
+ struct io_uring_zcrx_rqe rqes[IO_ZCRX_SYS_REFILL_BATCH];
+ struct io_uring_zcrx_rqe __user *urqes;
+ struct io_uring_zcrx_refill zr;
+ struct io_zcrx_ifq *ifq;
+ unsigned nr, i;
+
+ if (nr_arg)
+ return -EINVAL;
+ if (copy_from_user(&zr, arg, sizeof(zr)))
+ return -EFAULT;
+ if (!zr.nr_entries || zr.nr_entries > IO_ZCRX_MAX_SYS_REFILL_BUFS)
+ return -EINVAL;
+ if (!mem_is_zero(&zr.__resv, sizeof(zr.__resv)))
+ return -EINVAL;
+
+ ifq = xa_load(&ctx->zcrx_ctxs, zr.zcrx_id);
+ if (!ifq)
+ return -EINVAL;
+ nr = zr.nr_entries;
+ urqes = u64_to_user_ptr(zr.rqes);
+
+ for (i = 0; i < nr;) {
+ unsigned batch = min(nr - i, IO_ZCRX_SYS_REFILL_BATCH);
+
+ if (copy_from_user(rqes, urqes + i, sizeof(rqes)))
+ return i ? i : -EFAULT;
+ io_return_buffers(ifq, rqes, batch);
+
+ i += batch;
+ cond_resched();
+ }
+ return nr;
+}
+
static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
struct io_zcrx_ifq *ifq, int off, int len)
{
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index a48871b5adad..33ef61503092 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -63,6 +63,8 @@ struct io_zcrx_ifq {
};
#if defined(CONFIG_IO_URING_ZCRX)
+int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg);
int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg);
void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx);
@@ -95,6 +97,11 @@ static inline struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ct
{
return NULL;
}
+static inline int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg)
+{
+ return -EOPNOTSUPP;
+}
#endif
int io_recvzc(struct io_kiocb *req, unsigned int issue_flags);
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill
2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov
@ 2025-08-20 18:20 ` Jens Axboe
2025-08-20 18:57 ` Pavel Begunkov
2 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2025-08-20 18:20 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote:
> Returning buffers via a ring is efficient but can cause problems
> when the ring doesn't have space. Add a way to return buffers
> synchronously via io_uring "register" syscall, which should serve
> as a slow fallback path.
>
> For a full branch with all relevant dependencies see
> https://github.com/isilence/linux.git zcrx/for-next
>
> [...]
Applied, thanks!
[1/2] io_uring/zcrx: introduce io_parse_rqe()
commit: 55e5f6bb0ddfbb5912fec373ed99f9e02b19e3ea
[2/2] io_uring/zcrx: allow synchronous buffer return
commit: ebbedae5b04f4f4a4d79d2d7329baab74b5a0564
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill
2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe
@ 2025-08-20 18:57 ` Pavel Begunkov
2025-08-20 19:02 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2025-08-20 18:57 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 8/20/25 19:20, Jens Axboe wrote:
>
> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote:
>> Returning buffers via a ring is efficient but can cause problems
>> when the ring doesn't have space. Add a way to return buffers
>> synchronously via io_uring "register" syscall, which should serve
>> as a slow fallback path.
>>
>> For a full branch with all relevant dependencies see
>> https://github.com/isilence/linux.git zcrx/for-next
>>
>> [...]
>
> Applied, thanks!
Leave these and other series out please. I sent them for
review, but it'll be easier to keep in a branch and rebase
it if necessary. I hoped tagging with zcrx-next would be
good enough of a sign.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill
2025-08-20 18:57 ` Pavel Begunkov
@ 2025-08-20 19:02 ` Jens Axboe
2025-08-20 19:33 ` Pavel Begunkov
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2025-08-20 19:02 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 8/20/25 12:57 PM, Pavel Begunkov wrote:
> On 8/20/25 19:20, Jens Axboe wrote:
>>
>> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote:
>>> Returning buffers via a ring is efficient but can cause problems
>>> when the ring doesn't have space. Add a way to return buffers
>>> synchronously via io_uring "register" syscall, which should serve
>>> as a slow fallback path.
>>>
>>> For a full branch with all relevant dependencies see
>>> https://github.com/isilence/linux.git zcrx/for-next
>>>
>>> [...]
>>
>> Applied, thanks!
>
> Leave these and other series out please. I sent them for
> review, but it'll be easier to keep in a branch and rebase
> it if necessary. I hoped tagging with zcrx-next would be
> good enough of a sign.
Alright, but then please make it clear when you are sending
something out for merging.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill
2025-08-20 19:02 ` Jens Axboe
@ 2025-08-20 19:33 ` Pavel Begunkov
2025-08-20 19:36 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2025-08-20 19:33 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 8/20/25 20:02, Jens Axboe wrote:
> On 8/20/25 12:57 PM, Pavel Begunkov wrote:
>> On 8/20/25 19:20, Jens Axboe wrote:
>>>
>>> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote:
>>>> Returning buffers via a ring is efficient but can cause problems
>>>> when the ring doesn't have space. Add a way to return buffers
>>>> synchronously via io_uring "register" syscall, which should serve
>>>> as a slow fallback path.
>>>>
>>>> For a full branch with all relevant dependencies see
>>>> https://github.com/isilence/linux.git zcrx/for-next
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>
>> Leave these and other series out please. I sent them for
>> review, but it'll be easier to keep in a branch and rebase
>> it if necessary. I hoped tagging with zcrx-next would be
>> good enough of a sign.
>
> Alright, but then please make it clear when you are sending
> something out for merging.
That would be a pull request, should be obvious enough. Perhaps
it'd make sense to tag for review only patches. Any preference
what tag it should be or otherwise?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [zcrx-next 0/2] add support for synchronous refill
2025-08-20 19:33 ` Pavel Begunkov
@ 2025-08-20 19:36 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2025-08-20 19:36 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 8/20/25 1:33 PM, Pavel Begunkov wrote:
> On 8/20/25 20:02, Jens Axboe wrote:
>> On 8/20/25 12:57 PM, Pavel Begunkov wrote:
>>> On 8/20/25 19:20, Jens Axboe wrote:
>>>>
>>>> On Sun, 17 Aug 2025 23:44:56 +0100, Pavel Begunkov wrote:
>>>>> Returning buffers via a ring is efficient but can cause problems
>>>>> when the ring doesn't have space. Add a way to return buffers
>>>>> synchronously via io_uring "register" syscall, which should serve
>>>>> as a slow fallback path.
>>>>>
>>>>> For a full branch with all relevant dependencies see
>>>>> https://github.com/isilence/linux.git zcrx/for-next
>>>>>
>>>>> [...]
>>>>
>>>> Applied, thanks!
>>>
>>> Leave these and other series out please. I sent them for
>>> review, but it'll be easier to keep in a branch and rebase
>>> it if necessary. I hoped tagging with zcrx-next would be
>>> good enough of a sign.
>>
>> Alright, but then please make it clear when you are sending
>> something out for merging.
>
> That would be a pull request, should be obvious enough. Perhaps
> it'd make sense to tag for review only patches. Any preference
> what tag it should be or otherwise?
Please just send patches rather than a pull request. PRs get messed up
by people all the time, and there's no picking and choosing there, it's
all or nothing.
A review tag would be good in the subject, anything easily recognizable
should do it. Like for-review or whatever, doesn't matter to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-20 19:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-17 22:44 [zcrx-next 0/2] add support for synchronous refill Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 1/2] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
2025-08-17 22:44 ` [zcrx-next 2/2] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov
2025-08-20 18:20 ` [zcrx-next 0/2] add support for synchronous refill Jens Axboe
2025-08-20 18:57 ` Pavel Begunkov
2025-08-20 19:02 ` Jens Axboe
2025-08-20 19:33 ` Pavel Begunkov
2025-08-20 19:36 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).