* [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER @ 2025-09-03 3:26 Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos ` (4 more replies) 0 siblings, 5 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-03 3:26 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating an io_uring doesn't actually enable any additional optimizations (aside from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task submits SQEs to skip taking the uring_lock mutex in the submission and task work paths. First, we need to close a hole in the IORING_SETUP_SINGLE_ISSUER checks where IORING_REGISTER_CLONE_BUFFERS only checks whether the thread is allowed to access one of the two io_urings. It assumes the uring_lock will prevent concurrent access to the other io_uring, but this will no longer be the case after the optimization to skip taking uring_lock. We also need to remove the unused filetable.h #include from io_uring.h to avoid an #include cycle. Caleb Sander Mateos (4): io_uring: don't include filetable.h in io_uring.h io_uring/rsrc: respect submitter_task in io_register_clone_buffers() io_uring: factor out uring_lock helpers io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER io_uring/cancel.c | 1 + io_uring/fdinfo.c | 2 +- io_uring/filetable.c | 3 ++- io_uring/io_uring.c | 58 +++++++++++++++++++++++++++----------------- io_uring/io_uring.h | 43 ++++++++++++++++++++++++++------ io_uring/kbuf.c | 6 ++--- io_uring/net.c | 1 + io_uring/notif.c | 5 ++-- io_uring/notif.h | 3 ++- io_uring/openclose.c | 1 + io_uring/poll.c | 2 +- io_uring/register.c | 1 + io_uring/rsrc.c | 10 +++++++- io_uring/rsrc.h | 3 ++- io_uring/rw.c | 3 ++- io_uring/splice.c | 1 + io_uring/waitid.c | 2 +- 17 files changed, 102 insertions(+), 43 deletions(-) -- 2.45.2 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos @ 2025-09-03 3:26 ` Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-03 3:26 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos io_uring/io_uring.h doesn't use anything declared in io_uring/filetable.h, so drop the unnecessary #include. Add filetable.h includes in .c files previously relying on the transitive include from io_uring.h. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> --- io_uring/cancel.c | 1 + io_uring/fdinfo.c | 2 +- io_uring/io_uring.c | 1 + io_uring/io_uring.h | 1 - io_uring/net.c | 1 + io_uring/openclose.c | 1 + io_uring/register.c | 1 + io_uring/rsrc.c | 1 + io_uring/rw.c | 1 + io_uring/splice.c | 1 + 10 files changed, 9 insertions(+), 2 deletions(-) diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 6d57602304df..64b51e82baa2 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -9,10 +9,11 @@ #include <linux/nospec.h> #include <linux/io_uring.h> #include <uapi/linux/io_uring.h> +#include "filetable.h" #include "io_uring.h" #include "tctx.h" #include "poll.h" #include "timeout.h" #include "waitid.h" diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index 5c7339838769..ff3364531c77 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -7,11 +7,11 @@ #include <linux/seq_file.h> #include <linux/io_uring.h> #include <uapi/linux/io_uring.h> -#include "io_uring.h" +#include "filetable.h" #include "sqpoll.h" #include "fdinfo.h" #include "cancel.h" #include "rsrc.h" diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 545a7d5eefec..9c1190b19adf 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -77,10 +77,11 @@ #include <uapi/linux/io_uring.h> #include "io-wq.h" +#include "filetable.h" #include "io_uring.h" #include "opdef.h" #include "refs.h" #include "tctx.h" #include "register.h" diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index fa8a66b34d4e..d62b7d9fafed 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -9,11 +9,10 @@ #include <linux/io_uring_types.h> #include <uapi/linux/eventpoll.h> #include "alloc_cache.h" #include "io-wq.h" #include "slist.h" -#include "filetable.h" #include "opdef.h" #ifndef CREATE_TRACE_POINTS #include <trace/events/io_uring.h> #endif diff --git a/io_uring/net.c b/io_uring/net.c index d2ca49ceb79d..cf4bf4a2264b 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -8,10 +8,11 @@ #include <net/compat.h> #include <linux/io_uring.h> #include <uapi/linux/io_uring.h> +#include "filetable.h" #include "io_uring.h" #include "kbuf.h" #include "alloc_cache.h" #include "net.h" #include "notif.h" diff --git a/io_uring/openclose.c b/io_uring/openclose.c index d70700e5cef8..bfeb91b31bba 100644 --- a/io_uring/openclose.c +++ b/io_uring/openclose.c @@ -12,10 +12,11 @@ #include <uapi/linux/io_uring.h> #include "../fs/internal.h" +#include "filetable.h" #include "io_uring.h" #include "rsrc.h" #include "openclose.h" struct io_open { diff --git a/io_uring/register.c b/io_uring/register.c index aa5f56ad8358..5e493917a1a8 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -16,10 +16,11 @@ #include <linux/nospec.h> #include <linux/compat.h> #include <linux/io_uring.h> #include <linux/io_uring_types.h> +#include "filetable.h" #include "io_uring.h" #include "opdef.h" #include "tctx.h" #include "rsrc.h" #include "sqpoll.h" diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index f75f5e43fa4a..2d15b8785a95 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -11,10 +11,11 @@ #include <linux/io_uring.h> #include <linux/io_uring/cmd.h> #include <uapi/linux/io_uring.h> +#include "filetable.h" #include "io_uring.h" #include "openclose.h" #include "rsrc.h" #include "memmap.h" #include "register.h" diff --git a/io_uring/rw.c b/io_uring/rw.c index dcde5bb7421a..ab6b4afccec3 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -13,10 +13,11 @@ #include <linux/io_uring/cmd.h> #include <linux/indirect_call_wrapper.h> #include <uapi/linux/io_uring.h> +#include "filetable.h" #include "io_uring.h" #include "opdef.h" #include "kbuf.h" #include "alloc_cache.h" #include "rsrc.h" diff --git a/io_uring/splice.c b/io_uring/splice.c index 35ce4e60b495..e81ebbb91925 100644 --- a/io_uring/splice.c +++ b/io_uring/splice.c @@ -9,10 +9,11 @@ #include <linux/io_uring.h> #include <linux/splice.h> #include <uapi/linux/io_uring.h> +#include "filetable.h" #include "io_uring.h" #include "splice.h" struct io_splice { struct file *file_out; -- 2.45.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos @ 2025-09-03 3:26 ` Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 3/4] io_uring: factor out uring_lock helpers Caleb Sander Mateos ` (2 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-03 3:26 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos io_ring_ctx's enabled with IORING_SETUP_SINGLE_ISSUER are only allowed a single task submitting to the ctx. Although the documentation only mentions this restriction applying to io_uring_enter() syscalls, commit d7cce96c449e ("io_uring: limit registration w/ SINGLE_ISSUER") extends it to io_uring_register(). Ensuring only one task interacts with the io_ring_ctx will be important to allow this task to avoid taking the uring_lock. There is, however, one gap in these checks: io_register_clone_buffers() may take the uring_lock on a second (source) io_ring_ctx, but __io_uring_register() only checks the current thread against the *destination* io_ring_ctx's submitter_task. Fail the IORING_REGISTER_CLONE_BUFFERS with -EEXIST if the source io_ring_ctx has a registered submitter_task other than the current task. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> --- io_uring/rsrc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 2d15b8785a95..1e5b7833076a 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1298,14 +1298,21 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg) src_ctx = file->private_data; if (src_ctx != ctx) { mutex_unlock(&ctx->uring_lock); lock_two_rings(ctx, src_ctx); + + if (src_ctx->submitter_task && + src_ctx->submitter_task != current) { + ret = -EEXIST; + goto out; + } } ret = io_clone_buffers(ctx, src_ctx, &buf); +out: if (src_ctx != ctx) mutex_unlock(&src_ctx->uring_lock); fput(file); return ret; -- 2.45.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] io_uring: factor out uring_lock helpers 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos @ 2025-09-03 3:26 ` Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos 2025-09-03 21:55 ` [syzbot ci] " syzbot ci 4 siblings, 0 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-03 3:26 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos A subsequent commit will skip acquiring the io_ring_ctx uring_lock in io_uring_enter() and io_handle_tw_list() for IORING_SETUP_SINGLE_ISSUER. Prepare for this change by factoring out the uring_lock accesses under these functions into helper functions: - io_ring_ctx_lock() for mutex_lock(&ctx->uring_lock) - io_ring_ctx_unlock() for mutex_unlock(&ctx->uring_lock) - io_ring_ctx_assert_locked() for lockdep_assert_held(&ctx->uring_lock) For now, the helpers unconditionally call the mutex functions. But a subsequent commit will condition them on !IORING_SETUP_SINGLE_ISSUER. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> --- io_uring/filetable.c | 3 ++- io_uring/io_uring.c | 51 ++++++++++++++++++++++++++------------------ io_uring/io_uring.h | 28 ++++++++++++++++++------ io_uring/kbuf.c | 6 +++--- io_uring/notif.c | 5 +++-- io_uring/notif.h | 3 ++- io_uring/poll.c | 2 +- io_uring/rsrc.c | 2 +- io_uring/rsrc.h | 3 ++- io_uring/rw.c | 2 +- io_uring/waitid.c | 2 +- 11 files changed, 67 insertions(+), 40 deletions(-) diff --git a/io_uring/filetable.c b/io_uring/filetable.c index a21660e3145a..aae283e77856 100644 --- a/io_uring/filetable.c +++ b/io_uring/filetable.c @@ -55,14 +55,15 @@ void io_free_file_tables(struct io_ring_ctx *ctx, struct io_file_table *table) table->bitmap = NULL; } static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file, u32 slot_index) - __must_hold(&req->ctx->uring_lock) { struct io_rsrc_node *node; + io_ring_ctx_assert_locked(ctx); + if (io_is_uring_fops(file)) return -EBADF; if (!ctx->file_table.data.nr) return -ENXIO; if (slot_index >= ctx->file_table.data.nr) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9c1190b19adf..7f19b6da5d3d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -554,11 +554,11 @@ static unsigned io_linked_nr(struct io_kiocb *req) static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx) { bool drain_seen = false, first = true; - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); __io_req_caches_free(ctx); while (!list_empty(&ctx->defer_list)) { struct io_defer_entry *de = list_first_entry(&ctx->defer_list, struct io_defer_entry, list); @@ -925,11 +925,11 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags * Must be called from inline task_work so we now a flush will happen later, * and obviously with ctx->uring_lock held (tw always has that). */ void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); lockdep_assert(ctx->lockless_cq); if (!io_fill_cqe_aux(ctx, user_data, res, cflags)) { struct io_cqe cqe = io_init_cqe(user_data, res, cflags); @@ -954,11 +954,11 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) */ if (!wq_list_empty(&ctx->submit_state.compl_reqs)) __io_submit_flush_completions(ctx); lockdep_assert(!io_wq_current_is_worker()); - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); if (!ctx->lockless_cq) { spin_lock(&ctx->completion_lock); posted = io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags); spin_unlock(&ctx->completion_lock); @@ -978,11 +978,11 @@ bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2]) { struct io_ring_ctx *ctx = req->ctx; bool posted; lockdep_assert(!io_wq_current_is_worker()); - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); cqe[0].user_data = req->cqe.user_data; if (!ctx->lockless_cq) { spin_lock(&ctx->completion_lock); posted = io_fill_cqe_aux32(ctx, cqe); @@ -1032,15 +1032,14 @@ static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) */ req_ref_put(req); } void io_req_defer_failed(struct io_kiocb *req, s32 res) - __must_hold(&ctx->uring_lock) { const struct io_cold_def *def = &io_cold_defs[req->opcode]; - lockdep_assert_held(&req->ctx->uring_lock); + io_ring_ctx_assert_locked(req->ctx); req_set_fail(req); io_req_set_res(req, res, io_put_kbuf(req, res, NULL)); if (def->fail) def->fail(req); @@ -1052,16 +1051,17 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res) * handlers and io_issue_sqe() are done with it, e.g. inline completion path. * Because of that, io_alloc_req() should be called only under ->uring_lock * and with extra caution to not get a request that is still worked on. */ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) - __must_hold(&ctx->uring_lock) { gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO; void *reqs[IO_REQ_ALLOC_BATCH]; int ret; + io_ring_ctx_assert_locked(ctx); + ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs); /* * Bulk alloc is all-or-nothing. If we fail to get a batch, * retry single alloc to be on the safe side. @@ -1126,11 +1126,11 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); io_submit_flush_completions(ctx); - mutex_unlock(&ctx->uring_lock); + io_ring_ctx_unlock(ctx); percpu_ref_put(&ctx->refs); } /* * Run queued task_work, returning the number of entries processed in *count. @@ -1150,11 +1150,11 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, io_task_work.node); if (req->ctx != ctx) { ctx_flush_and_put(ctx, ts); ctx = req->ctx; - mutex_lock(&ctx->uring_lock); + io_ring_ctx_lock(ctx); percpu_ref_get(&ctx->refs); } INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, req, ts); @@ -1502,12 +1502,13 @@ static inline void io_req_put_rsrc_nodes(struct io_kiocb *req) io_put_rsrc_node(req->ctx, req->buf_node); } static void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node) - __must_hold(&ctx->uring_lock) { + io_ring_ctx_assert_locked(ctx); + do { struct io_kiocb *req = container_of(node, struct io_kiocb, comp_list); if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) { @@ -1543,15 +1544,16 @@ static void io_free_batch_list(struct io_ring_ctx *ctx, io_req_add_to_cache(req, ctx); } while (node); } void __io_submit_flush_completions(struct io_ring_ctx *ctx) - __must_hold(&ctx->uring_lock) { struct io_submit_state *state = &ctx->submit_state; struct io_wq_work_node *node; + io_ring_ctx_assert_locked(ctx); + __io_cq_lock(ctx); __wq_list_for_each(node, &state->compl_reqs) { struct io_kiocb *req = container_of(node, struct io_kiocb, comp_list); @@ -1767,16 +1769,17 @@ io_req_flags_t io_file_get_flags(struct file *file) res |= REQ_F_SUPPORT_NOWAIT; return res; } static __cold void io_drain_req(struct io_kiocb *req) - __must_hold(&ctx->uring_lock) { struct io_ring_ctx *ctx = req->ctx; bool drain = req->flags & IOSQE_IO_DRAIN; struct io_defer_entry *de; + io_ring_ctx_assert_locked(ctx); + de = kmalloc(sizeof(*de), GFP_KERNEL_ACCOUNT); if (!de) { io_req_defer_failed(req, -ENOMEM); return; } @@ -2043,12 +2046,13 @@ static int io_req_sqe_copy(struct io_kiocb *req, unsigned int issue_flags) def->sqe_copy(req); return 0; } static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int ret) - __must_hold(&req->ctx->uring_lock) { + io_ring_ctx_assert_locked(req->ctx); + if (ret != -EAGAIN || (req->flags & REQ_F_NOWAIT)) { fail: io_req_defer_failed(req, ret); return; } @@ -2068,16 +2072,17 @@ static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int r break; } } static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags) - __must_hold(&req->ctx->uring_lock) { unsigned int issue_flags = IO_URING_F_NONBLOCK | IO_URING_F_COMPLETE_DEFER | extra_flags; int ret; + io_ring_ctx_assert_locked(req->ctx); + ret = io_issue_sqe(req, issue_flags); /* * We async punt it if the file wasn't marked NOWAIT, or if the file * doesn't support non-blocking read/write attempts @@ -2085,12 +2090,13 @@ static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags) if (unlikely(ret)) io_queue_async(req, issue_flags, ret); } static void io_queue_sqe_fallback(struct io_kiocb *req) - __must_hold(&req->ctx->uring_lock) { + io_ring_ctx_assert_locked(req->ctx); + if (unlikely(req->flags & REQ_F_FAIL)) { /* * We don't submit, fail them all, for that replace hardlinks * with normal links. Extra REQ_F_LINK is tolerated. */ @@ -2155,17 +2161,18 @@ static __cold int io_init_fail_req(struct io_kiocb *req, int err) return err; } static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, const struct io_uring_sqe *sqe) - __must_hold(&ctx->uring_lock) { const struct io_issue_def *def; unsigned int sqe_flags; int personality; u8 opcode; + io_ring_ctx_assert_locked(ctx); + req->ctx = ctx; req->opcode = opcode = READ_ONCE(sqe->opcode); /* same numerical values with corresponding REQ_F_*, safe to copy */ sqe_flags = READ_ONCE(sqe->flags); req->flags = (__force io_req_flags_t) sqe_flags; @@ -2290,15 +2297,16 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, return 0; } static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, const struct io_uring_sqe *sqe) - __must_hold(&ctx->uring_lock) { struct io_submit_link *link = &ctx->submit_state.link; int ret; + io_ring_ctx_assert_locked(ctx); + ret = io_init_req(ctx, req, sqe); if (unlikely(ret)) return io_submit_fail_init(sqe, req, ret); trace_io_uring_submit_req(req); @@ -2419,16 +2427,17 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) *sqe = &ctx->sq_sqes[head]; return true; } int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) - __must_hold(&ctx->uring_lock) { unsigned int entries = io_sqring_entries(ctx); unsigned int left; int ret; + io_ring_ctx_assert_locked(ctx); + if (unlikely(!entries)) return 0; /* make sure SQ entry isn't read before tail */ ret = left = min(nr, entries); io_get_task_refs(left); @@ -3518,14 +3527,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } else if (to_submit) { ret = io_uring_add_tctx_node(ctx); if (unlikely(ret)) goto out; - mutex_lock(&ctx->uring_lock); + io_ring_ctx_lock(ctx); ret = io_submit_sqes(ctx, to_submit); if (ret != to_submit) { - mutex_unlock(&ctx->uring_lock); + io_ring_ctx_unlock(ctx); goto out; } if (flags & IORING_ENTER_GETEVENTS) { if (ctx->syscall_iopoll) goto iopoll_locked; @@ -3534,11 +3543,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, * it should handle ownership problems if any. */ if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) (void)io_run_local_work_locked(ctx, min_complete); } - mutex_unlock(&ctx->uring_lock); + io_ring_ctx_unlock(ctx); } if (flags & IORING_ENTER_GETEVENTS) { int ret2; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index d62b7d9fafed..a0580a1bf6b5 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -119,20 +119,35 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx, bool cancel_all); void io_activate_pollwq(struct io_ring_ctx *ctx); +static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx) +{ + mutex_lock(&ctx->uring_lock); +} + +static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx) +{ + mutex_unlock(&ctx->uring_lock); +} + +static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx) +{ + lockdep_assert_held(&ctx->uring_lock); +} + static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) { #if defined(CONFIG_PROVE_LOCKING) lockdep_assert(in_task()); if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); if (ctx->flags & IORING_SETUP_IOPOLL) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); } else if (!ctx->task_complete) { lockdep_assert_held(&ctx->completion_lock); } else if (ctx->submitter_task) { /* * ->submitter_task may be NULL and we can still post a CQE, @@ -300,11 +315,11 @@ static inline void io_put_file(struct io_kiocb *req) } static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx, unsigned issue_flags) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) mutex_unlock(&ctx->uring_lock); } static inline void io_ring_submit_lock(struct io_ring_ctx *ctx, @@ -316,11 +331,11 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx, * The only exception is when we've detached the request and issue it * from an async worker thread, grab the lock for that case. */ if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) mutex_lock(&ctx->uring_lock); - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); } static inline void io_commit_cqring(struct io_ring_ctx *ctx) { /* order cqe stores with ring update */ @@ -428,24 +443,23 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx) return task_work_pending(current) || io_local_work_pending(ctx); } static inline void io_tw_lock(struct io_ring_ctx *ctx, io_tw_token_t tw) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); } /* * Don't complete immediately but use deferred completion infrastructure. * Protected by ->uring_lock and can only be used either with * IO_URING_F_COMPLETE_DEFER or inside a tw handler holding the mutex. */ static inline void io_req_complete_defer(struct io_kiocb *req) - __must_hold(&req->ctx->uring_lock) { struct io_submit_state *state = &req->ctx->submit_state; - lockdep_assert_held(&req->ctx->uring_lock); + io_ring_ctx_assert_locked(req->ctx); wq_list_add_tail(&req->comp_list, &state->compl_reqs); } static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 3e9aab21af9d..ea6f3588d875 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -68,11 +68,11 @@ bool io_kbuf_commit(struct io_kiocb *req, } static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); return xa_load(&ctx->io_bl_xa, bgid); } static int io_buffer_add_list(struct io_ring_ctx *ctx, @@ -337,11 +337,11 @@ int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg, { struct io_ring_ctx *ctx = req->ctx; struct io_buffer_list *bl; int ret; - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); bl = io_buffer_get_list(ctx, arg->buf_group); if (unlikely(!bl)) return -ENOENT; @@ -393,11 +393,11 @@ static int io_remove_buffers_legacy(struct io_ring_ctx *ctx, { unsigned long i = 0; struct io_buffer *nxt; /* protects io_buffers_cache */ - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); WARN_ON_ONCE(bl->flags & IOBL_BUF_RING); for (i = 0; i < nbufs && !list_empty(&bl->buf_list); i++) { nxt = list_first_entry(&bl->buf_list, struct io_buffer, list); list_del(&nxt->list); diff --git a/io_uring/notif.c b/io_uring/notif.c index 8c92e9cde2c6..9dd248fcb213 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -14,11 +14,11 @@ static const struct ubuf_info_ops io_ubuf_ops; static void io_notif_tw_complete(struct io_kiocb *notif, io_tw_token_t tw) { struct io_notif_data *nd = io_notif_to_data(notif); struct io_ring_ctx *ctx = notif->ctx; - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); do { notif = cmd_to_io_kiocb(nd); if (WARN_ON_ONCE(ctx != notif->ctx)) @@ -108,15 +108,16 @@ static const struct ubuf_info_ops io_ubuf_ops = { .complete = io_tx_ubuf_complete, .link_skb = io_link_skb, }; struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) - __must_hold(&ctx->uring_lock) { struct io_kiocb *notif; struct io_notif_data *nd; + io_ring_ctx_assert_locked(ctx); + if (unlikely(!io_alloc_req(ctx, ¬if))) return NULL; notif->ctx = ctx; notif->opcode = IORING_OP_NOP; notif->flags = 0; diff --git a/io_uring/notif.h b/io_uring/notif.h index f3589cfef4a9..c33c9a1179c9 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -31,14 +31,15 @@ static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif) { return io_kiocb_to_cmd(notif, struct io_notif_data); } static inline void io_notif_flush(struct io_kiocb *notif) - __must_hold(¬if->ctx->uring_lock) { struct io_notif_data *nd = io_notif_to_data(notif); + io_ring_ctx_assert_locked(notif->ctx); + io_tx_ubuf_complete(NULL, &nd->uarg, true); } static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len) { diff --git a/io_uring/poll.c b/io_uring/poll.c index ea75c5cd81a0..ba71403c8fd8 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -121,11 +121,11 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req) static void io_poll_req_insert(struct io_kiocb *req) { struct io_hash_table *table = &req->ctx->cancel_table; u32 index = hash_long(req->cqe.user_data, table->hash_bits); - lockdep_assert_held(&req->ctx->uring_lock); + io_ring_ctx_assert_locked(req->ctx); hlist_add_head(&req->hash_node, &table->hbs[index].list); } static void io_init_poll_iocb(struct io_poll *poll, __poll_t events) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 1e5b7833076a..1c1753de7340 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -347,11 +347,11 @@ static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type, struct io_uring_rsrc_update2 *up, unsigned nr_args) { __u32 tmp; - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); if (check_add_overflow(up->offset, nr_args, &tmp)) return -EOVERFLOW; switch (type) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index a3ca6ba66596..d537a3b895d6 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -2,10 +2,11 @@ #ifndef IOU_RSRC_H #define IOU_RSRC_H #include <linux/io_uring_types.h> #include <linux/lockdep.h> +#include "io_uring.h" #define IO_VEC_CACHE_SOFT_CAP 256 enum { IORING_RSRC_FILE = 0, @@ -97,11 +98,11 @@ static inline struct io_rsrc_node *io_rsrc_node_lookup(struct io_rsrc_data *data return NULL; } static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { - lockdep_assert_held(&ctx->uring_lock); + io_ring_ctx_assert_locked(ctx); if (!--node->refs) io_free_rsrc_node(ctx, node); } static inline bool io_reset_rsrc_node(struct io_ring_ctx *ctx, diff --git a/io_uring/rw.c b/io_uring/rw.c index ab6b4afccec3..f00e02a02dc7 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -461,11 +461,11 @@ int io_read_mshot_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } void io_readv_writev_cleanup(struct io_kiocb *req) { - lockdep_assert_held(&req->ctx->uring_lock); + io_ring_ctx_assert_locked(req->ctx); io_rw_recycle(req, 0); } static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req) { diff --git a/io_uring/waitid.c b/io_uring/waitid.c index 26c118f3918d..f7a5054d4d81 100644 --- a/io_uring/waitid.c +++ b/io_uring/waitid.c @@ -114,11 +114,11 @@ static void io_waitid_complete(struct io_kiocb *req, int ret) struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid); /* anyone completing better be holding a reference */ WARN_ON_ONCE(!(atomic_read(&iw->refs) & IO_WAITID_REF_MASK)); - lockdep_assert_held(&req->ctx->uring_lock); + io_ring_ctx_assert_locked(req->ctx); hlist_del_init(&req->hash_node); ret = io_waitid_finish(req, ret); if (ret < 0) -- 2.45.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos ` (2 preceding siblings ...) 2025-09-03 3:26 ` [PATCH 3/4] io_uring: factor out uring_lock helpers Caleb Sander Mateos @ 2025-09-03 3:26 ` Caleb Sander Mateos 2025-09-03 21:55 ` [syzbot ci] " syzbot ci 4 siblings, 0 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-03 3:26 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS workloads. Even when only one thread pinned to a single CPU is accessing the io_ring_ctx, the atomic CAS required to lock and unlock the mutex is a very hot instruction. The mutex's primary purpose is to prevent concurrent io_uring system calls on the same io_ring_ctx. However, there is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one task will make io_uring_enter() and io_uring_register() system calls on the io_ring_ctx once it's enabled. So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the uring_lock mutex_lock() and mutex_unlock() for the io_uring_enter() submission as well as for io_handle_tw_list(). io_uring_enter() submission calls __io_uring_add_tctx_node_from_submit() to verify the current task matches submitter_task for IORING_SETUP_SINGLE_ISSUER. And task work can only be scheduled on tasks that submit io_uring requests, so io_handle_tw_list() will also only be called on submitter_task. There is a goto from the io_uring_enter() submission to the middle of the IOPOLL block which assumed the uring_lock would already be held. This is no longer the case for IORING_SETUP_SINGLE_ISSUER, so goto the preceding mutex_lock() in that case. It may be possible to avoid taking uring_lock in other places too for IORING_SETUP_SINGLE_ISSUER, but these two cover the primary hot paths. The uring_lock in io_uring_register() is necessary at least before the io_uring is enabled because submitter_task isn't set yet. uring_lock is also used to synchronize IOPOLL on submitting tasks with io_uring worker tasks, so it's still needed there. But in principle, it should be possible to remove the mutex entirely for IORING_SETUP_SINGLE_ISSUER by running any code needing exclusive access to the io_ring_ctx in task work context on submitter_task. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> --- io_uring/io_uring.c | 6 +++++- io_uring/io_uring.h | 14 ++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7f19b6da5d3d..5793f6122159 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3534,12 +3534,15 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (ret != to_submit) { io_ring_ctx_unlock(ctx); goto out; } if (flags & IORING_ENTER_GETEVENTS) { - if (ctx->syscall_iopoll) + if (ctx->syscall_iopoll) { + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) + goto iopoll; goto iopoll_locked; + } /* * Ignore errors, we'll soon call io_cqring_wait() and * it should handle ownership problems if any. */ if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) @@ -3556,10 +3559,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, * We disallow the app entering submit/complete with * polling, but we still need to lock the ring to * prevent racing with polled issue that got punted to * a workqueue. */ +iopoll: mutex_lock(&ctx->uring_lock); iopoll_locked: ret2 = io_validate_ext_arg(ctx, flags, argp, argsz); if (likely(!ret2)) ret2 = io_iopoll_check(ctx, min_complete); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index a0580a1bf6b5..7296b12b0897 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -121,20 +121,34 @@ bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx, void io_activate_pollwq(struct io_ring_ctx *ctx); static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx) { + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) { + WARN_ON_ONCE(current != ctx->submitter_task); + return; + } + mutex_lock(&ctx->uring_lock); } static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx) { + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) { + WARN_ON_ONCE(current != ctx->submitter_task); + return; + } + mutex_unlock(&ctx->uring_lock); } static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx) { + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && + current == ctx->submitter_task) + return; + lockdep_assert_held(&ctx->uring_lock); } static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) { -- 2.45.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos ` (3 preceding siblings ...) 2025-09-03 3:26 ` [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos @ 2025-09-03 21:55 ` syzbot ci 2025-09-03 23:29 ` Jens Axboe 4 siblings, 1 reply; 10+ messages in thread From: syzbot ci @ 2025-09-03 21:55 UTC (permalink / raw) To: axboe, csander, io-uring, linux-kernel; +Cc: syzbot, syzkaller-bugs syzbot ci has tested the following series [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() * [PATCH 3/4] io_uring: factor out uring_lock helpers * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER and found the following issue: WARNING in io_handle_tw_list Full report is available here: https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae *** WARNING in io_handle_tw_list tree: linux-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7 arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro ------------[ cut here ]------------ WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297 WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297 Modules linked in: CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline] RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155 Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff RSP: 0018:ffffc900032cf938 EFLAGS: 00010293 RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0 RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000 RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38 R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000 R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000 FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0 Call Trace: <TASK> tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223 io_sq_tw io_uring/sqpoll.c:244 [inline] io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327 ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> *** If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com --- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-03 21:55 ` [syzbot ci] " syzbot ci @ 2025-09-03 23:29 ` Jens Axboe 2025-09-04 14:52 ` Caleb Sander Mateos 0 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2025-09-03 23:29 UTC (permalink / raw) To: syzbot ci, csander, io-uring, linux-kernel; +Cc: syzbot, syzkaller-bugs On 9/3/25 3:55 PM, syzbot ci wrote: > syzbot ci has tested the following series > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > * [PATCH 3/4] io_uring: factor out uring_lock helpers > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > and found the following issue: > WARNING in io_handle_tw_list > > Full report is available here: > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae > > *** > > WARNING in io_handle_tw_list > > tree: linux-next > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7 > arch: amd64 > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro > > ------------[ cut here ]------------ > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297 > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297 > Modules linked in: > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full) > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline] > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155 > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293 > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0 > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000 > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38 > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000 > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000 > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0 > Call Trace: > <TASK> > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223 > io_sq_tw io_uring/sqpoll.c:244 [inline] > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327 > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > </TASK> Probably the sanest thing to do here is to clear IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we allow it, it'll be impossible to uphold the locking criteria on both the issue and register side. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-03 23:29 ` Jens Axboe @ 2025-09-04 14:52 ` Caleb Sander Mateos 2025-09-04 16:46 ` Caleb Sander Mateos 0 siblings, 1 reply; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-04 14:52 UTC (permalink / raw) To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 9/3/25 3:55 PM, syzbot ci wrote: > > syzbot ci has tested the following series > > > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com > > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h > > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > > * [PATCH 3/4] io_uring: factor out uring_lock helpers > > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > > > and found the following issue: > > WARNING in io_handle_tw_list > > > > Full report is available here: > > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae > > > > *** > > > > WARNING in io_handle_tw_list > > > > tree: linux-next > > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next > > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7 > > arch: amd64 > > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 > > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config > > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro > > > > ------------[ cut here ]------------ > > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297 > > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297 > > Modules linked in: > > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full) > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline] > > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155 > > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff > > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293 > > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0 > > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000 > > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38 > > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000 > > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000 > > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0 > > Call Trace: > > <TASK> > > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223 > > io_sq_tw io_uring/sqpoll.c:244 [inline] > > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327 > > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > </TASK> > > Probably the sanest thing to do here is to clear > IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we > allow it, it'll be impossible to uphold the locking criteria on both the > issue and register side. Yup, I was thinking the same thing. Thanks for taking a look. Best, Caleb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-04 14:52 ` Caleb Sander Mateos @ 2025-09-04 16:46 ` Caleb Sander Mateos 2025-09-04 16:50 ` Caleb Sander Mateos 0 siblings, 1 reply; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-04 16:46 UTC (permalink / raw) To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs On Thu, Sep 4, 2025 at 7:52 AM Caleb Sander Mateos <csander@purestorage.com> wrote: > > On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote: > > > > On 9/3/25 3:55 PM, syzbot ci wrote: > > > syzbot ci has tested the following series > > > > > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com > > > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h > > > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > > > * [PATCH 3/4] io_uring: factor out uring_lock helpers > > > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > > > > > and found the following issue: > > > WARNING in io_handle_tw_list > > > > > > Full report is available here: > > > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae > > > > > > *** > > > > > > WARNING in io_handle_tw_list > > > > > > tree: linux-next > > > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next > > > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7 > > > arch: amd64 > > > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 > > > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config > > > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro > > > > > > ------------[ cut here ]------------ > > > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297 > > > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297 > > > Modules linked in: > > > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full) > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline] > > > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155 > > > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff > > > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293 > > > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0 > > > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000 > > > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38 > > > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000 > > > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000 > > > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0 > > > Call Trace: > > > <TASK> > > > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223 > > > io_sq_tw io_uring/sqpoll.c:244 [inline] > > > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327 > > > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148 > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > > </TASK> > > > > Probably the sanest thing to do here is to clear > > IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we > > allow it, it'll be impossible to uphold the locking criteria on both the > > issue and register side. > > Yup, I was thinking the same thing. Thanks for taking a look. On further thought, IORING_SETUP_SQPOLL actually does guarantee a single issuer. io_uring_enter() already avoids taking the uring_lock in the IORING_SETUP_SQPOLL case because it doesn't issue any SQEs itself. Only the SQ thread does that, so it *is* the single issuer. The assertions I added in io_ring_ctx_lock()/io_ring_ctx_unlock() is just unnecessarily strict. It should expect current == ctx->sq_data->thread in the IORING_SETUP_SQPOLL case. Best, Caleb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER 2025-09-04 16:46 ` Caleb Sander Mateos @ 2025-09-04 16:50 ` Caleb Sander Mateos 0 siblings, 0 replies; 10+ messages in thread From: Caleb Sander Mateos @ 2025-09-04 16:50 UTC (permalink / raw) To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs On Thu, Sep 4, 2025 at 9:46 AM Caleb Sander Mateos <csander@purestorage.com> wrote: > > On Thu, Sep 4, 2025 at 7:52 AM Caleb Sander Mateos > <csander@purestorage.com> wrote: > > > > On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote: > > > > > > On 9/3/25 3:55 PM, syzbot ci wrote: > > > > syzbot ci has tested the following series > > > > > > > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > > > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com > > > > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h > > > > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > > > > * [PATCH 3/4] io_uring: factor out uring_lock helpers > > > > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > > > > > > > and found the following issue: > > > > WARNING in io_handle_tw_list > > > > > > > > Full report is available here: > > > > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae > > > > > > > > *** > > > > > > > > WARNING in io_handle_tw_list > > > > > > > > tree: linux-next > > > > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next > > > > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7 > > > > arch: amd64 > > > > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 > > > > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config > > > > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro > > > > > > > > ------------[ cut here ]------------ > > > > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297 > > > > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297 > > > > Modules linked in: > > > > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full) > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > > > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline] > > > > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155 > > > > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff > > > > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293 > > > > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0 > > > > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000 > > > > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38 > > > > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000 > > > > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000 > > > > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0 > > > > Call Trace: > > > > <TASK> > > > > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223 > > > > io_sq_tw io_uring/sqpoll.c:244 [inline] > > > > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327 > > > > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148 > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > > > </TASK> > > > > > > Probably the sanest thing to do here is to clear > > > IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we > > > allow it, it'll be impossible to uphold the locking criteria on both the > > > issue and register side. > > > > Yup, I was thinking the same thing. Thanks for taking a look. > > On further thought, IORING_SETUP_SQPOLL actually does guarantee a > single issuer. io_uring_enter() already avoids taking the uring_lock > in the IORING_SETUP_SQPOLL case because it doesn't issue any SQEs > itself. Only the SQ thread does that, so it *is* the single issuer. > The assertions I added in io_ring_ctx_lock()/io_ring_ctx_unlock() is > just unnecessarily strict. It should expect current == > ctx->sq_data->thread in the IORING_SETUP_SQPOLL case. Oh, but you are totally correct about needing the mutex to synchronize between issue on the SQ thread and io_uring_register() on other threads. Yeah, I don't see an easy way to avoid taking the mutex on the SQ thread unless we disallowed io_uring_register() completely. Clearing IORING_SETUP_SINGLE_ISSUER seems like the best option for now. Best, Caleb ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-09-04 16:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-03 3:26 [PATCH 0/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 3/4] io_uring: factor out uring_lock helpers Caleb Sander Mateos 2025-09-03 3:26 ` [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos 2025-09-03 21:55 ` [syzbot ci] " syzbot ci 2025-09-03 23:29 ` Jens Axboe 2025-09-04 14:52 ` Caleb Sander Mateos 2025-09-04 16:46 ` Caleb Sander Mateos 2025-09-04 16:50 ` Caleb Sander Mateos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).