From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, io-uring@vger.kernel.org
Cc: naup96721@gmail.com, stable@vger.kernel.org
Subject: Re: [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
Date: Wed, 11 Mar 2026 11:13:20 +0000 [thread overview]
Message-ID: <39d1678f-7a7e-43d5-a92d-0b26b9bfd44e@gmail.com> (raw)
In-Reply-To: <20260310145521.68268-2-axboe@kernel.dk>
On 3/10/26 14:45, Jens Axboe wrote:
> If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
> the ring is being resized, it's possible for the OR'ing of
> IORING_SQ_TASKRUN to happen in the small window of swapping into the
> new rings and the old rings being freed.
>
> Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
> protected by RCU. The task work flags manipulation is inside RCU
> already, and if the resize ring freeing is done post an RCU synchronize,
> then there's no need to add locking to the fast path of task work
> additions.
>
> Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
> that supports ring resizing. If this ever changes, then they too need to
> use the io_ctx_mark_taskrun() helper.
>
> Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
> Cc: stable@vger.kernel.org
> Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
> Reported-by: Hao-Yu Yang <naup96721@gmail.com>
> Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> include/linux/io_uring_types.h | 1 +
> io_uring/io_uring.c | 2 ++
> io_uring/register.c | 20 ++++++++++++++++++--
> io_uring/tw.c | 24 ++++++++++++++++++++++--
> 4 files changed, 43 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 3e4a82a6f817..dd1420bfcb73 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -388,6 +388,7 @@ struct io_ring_ctx {
> * regularly bounce b/w CPUs.
> */
> struct {
> + struct io_rings __rcu *rings_rcu;
> struct llist_head work_llist;
> struct llist_head retry_llist;
> unsigned long check_cq;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index ccab8562d273..20fdc442e014 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2066,6 +2066,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
> io_free_region(ctx->user, &ctx->sq_region);
> io_free_region(ctx->user, &ctx->ring_region);
> ctx->rings = NULL;
> + RCU_INIT_POINTER(ctx->rings_rcu, NULL);
> ctx->sq_sqes = NULL;
> }
>
> @@ -2703,6 +2704,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
> if (ret)
> return ret;
> ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
> + rcu_assign_pointer(ctx->rings_rcu, rings);
> if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
> ctx->sq_array = (u32 *)((char *)rings + rl->sq_array_offset);
>
> diff --git a/io_uring/register.c b/io_uring/register.c
> index a839b22fd392..5f2985ba0879 100644
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -487,6 +487,18 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
> IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
> IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
>
> +static void io_resize_assign_rings(struct io_ring_ctx *ctx, struct io_rings *rings)
> +{
> + /*
> + * Just mark any flag we may have missed and that the application
> + * should act on unconditionally. Worst case it'll be an extra
> + * syscall.
> + */
> + atomic_or(IORING_SQ_TASKRUN | IORING_SQ_NEED_WAKEUP, &rings->sq_flags);
> + ctx->rings = rings;
> + rcu_assign_pointer(ctx->rings_rcu, rings);
> +}
> +
> static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> {
> struct io_ctx_config config;
> @@ -579,6 +591,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> spin_lock(&ctx->completion_lock);
> o.rings = ctx->rings;
> ctx->rings = NULL;
> + RCU_INIT_POINTER(ctx->rings_rcu, NULL);
> o.sq_sqes = ctx->sq_sqes;
> ctx->sq_sqes = NULL;
Should be better to not have a transient null, and then there
is no need to check for that in task_work. I.e. don't zero it
and only assign the new value if you successfully created a
new set of rings.
--
Pavel Begunkov
next prev parent reply other threads:[~2026-03-11 11:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 14:45 [PATCHSET 0/2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
2026-03-10 14:45 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
2026-03-11 11:13 ` Pavel Begunkov [this message]
2026-03-11 13:05 ` Jens Axboe
2026-03-10 14:45 ` [PATCH 2/2] io_uring/eventfd: use ctx->rings_rcu for flags checking Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2026-03-11 13:11 [PATCHSET v2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
2026-03-11 13:11 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
2026-03-11 15:06 ` Keith Busch
2026-03-11 15:12 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39d1678f-7a7e-43d5-a92d-0b26b9bfd44e@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=naup96721@gmail.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.