* Re: [SECURITY] io_uring UAF: io_uring_cmd_issue_blocking missing sqe copy before RESIZE_RINGS
2026-05-04 12:16 ` Jens Axboe
@ 2026-05-04 12:29 ` Jens Axboe
0 siblings, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2026-05-04 12:29 UTC (permalink / raw)
To: Carlo Conti; +Cc: io-uring
On 5/4/26 6:16 AM, Jens Axboe wrote:
> On 5/4/26 5:59 AM, Carlo Conti wrote:
>> Hello,
>>
>> I have identified a Use-After-Free vulnerability in the Linux kernel
>> io_uring subsystem, confirmed on Linux 6.19.11.
>>
>> The primary finding is the UAF itself, both a read and a write
>> primitive have been confirmed experimentally. As a secondary research
>> step, I also attempted to build a privilege escalation chain on top of
>> these primitives. The LPE reaches root but with a structural
>> limitation described below; I am reporting it in this state because
>> the UAF primitives are reliable and independently exploitable, and I
>> believe the escalation path warrants further investigation by the
>> kernel team.
>>
>> --- ROOT CAUSE ---
>>
>> io_uring_cmd_issue_blocking() in fs/io_uring.c calls
>
> fs/io_uring.c hasn't been a thing in many years?
>
>> io_req_queue_iowq(req) without first calling io_req_sqe_copy(). As a
>> result, the ioucmd->sqe field continues to point to the original
>> sq_sqes buffer.
>>
>> If IORING_REGISTER_RESIZE_RINGS (opcode 33) is issued immediately
>> after, io_free_region() frees the old sq_sqes page. When the io-wq
>> worker is eventually scheduled, it reads sqe->ioprio and other fields
>> from the freed page ? Use-After-Free read.
>>
>> Additional issue: io_free_region() does not call zap_vma_ptes() for
>> single-page non-vmap regions, so the userspace mmap of the old sq_sqes
>> (IORING_OFF_SQES) remains mapped to the freed physical page, providing
>> an arbitrary write primitive ? Use-After-Free write.
>
> Can you expand? Seems unrelated and also unlikely to be an issue, unless
> it's missing something explicitly.
>>
>> --- PRIMITIVES ---
>>
>> UAF read: confirmed. ioprio=0xFFFF written to live page before RESIZE;
>> worker reads it from freed page ? blkdev_uring_cmd() returns
>> -EINVAL. CQE ud=0x2222 res=-22 observed.
>>
>> UAF write: confirmed. Arbitrary write to freed page via stale PTE,
>> verified with sentinel probe (new mmap of IORING_OFF_SQES
>> does not see the sentinel written via the old mmap).
>>
>> --- LPE STATUS ---
>>
>> Cross-cache struct cred overwrite is blocked by a refcount invariant:
>> vm_insert_pages() increments page refcount 1?2 at mmap time;
>> io_free_region()'s release_pages() decrements 2?1 (not to 0). The page
>> remains KPF_MMAP=1, KPF_BUDDY=0 while the stale mmap is open, so
>> cred_jar cannot allocate it. Closing the mmap frees the page but
>> eliminates the write primitive (fundamental mutual exclusion).
>>
>> A fully unprivileged LPE would require a second independent write
>> primitive or a different victim object. Current PoC achieves root via
>> a research-context SUID binary to demonstrate the primitive chain.
>
>>
>> --- REPRODUCTION ---
>>
>> Requirements:
>> modprobe null_blk queue_mode=2 submit_queues=1 home_node=0 completion_nsec=500000000 queue_depth=1 discard=1 size=1024
>> chmod a+rw /dev/nullb0
>
> So... you need to be root in the first place here?
>
> Since a) you need to be root, and b) you sent this to both the public
> list and the security list, why don't you just send a patch for this?
>
> Taking security@ off the CC.
Since this is clearly just LLM hallucinations for the most part,
rather than waste my time on this, here's my LLM replying to yours.
tldr - nothing burger, send a patch if you want. If I don't hear back,
I'll add the copy part myself.
Analysis of "io_uring UAF" report ("Carlo")
=====================================================
Verdict: structurally bogus. The "UAF" isn't a UAF, and the reporter
contradicts themselves about it.
What's actually true in the report
----------------------------------
There is a real code-path bug: io_uring_cmd_issue_blocking() at
io_uring/uring_cmd.c:325 queues to iowq without calling io_req_sqe_copy().
For the bdev discard nowait -> partial -> re-issue path
(block/ioctl.c:864-873), this means iowq later re-issues blkdev_uring_cmd()
which still reads cmd->sqe from the user's SQ ring.
Why "UAF" is wrong
------------------
The reporter contradicts their own headline in the LPE STATUS section:
vm_insert_pages() increments page refcount 1->2 at mmap time;
io_free_region()'s release_pages() decrements 2->1 (not to 0).
The page remains KPF_MMAP=1, KPF_BUDDY=0 while the stale mmap is open
That is exactly correct - and exactly why this is NOT a UAF:
1. io_region_init_ptr() (memmap.c:114) for single-page non-highmem
regions sets mr->ptr = page_address(pages[0]) - the kernel linear-map
address. That stays valid as long as the page is alive.
2. The user's PTE (installed via vm_insert_pages in io_region_mmap())
holds a refcount on the page.
3. io_free_region() drops the kernel's ref via release_pages(). The page
is still alive because the user mapping holds it.
4. So when iowq reads cmd->sqe, it reads the same physical page the user
is writing to - not a freed page. KASAN won't fire.
The "UAF read" demo (writing ioprio=0xFFFF and seeing -EINVAL come out
of blkdev_uring_cmd) doesn't prove a UAF - it just proves that the user
can mutate the SQE between submission and re-issue (which is the
underlying sqe_copy bug, with or without RESIZE). The "UAF write" is
even sillier: writing through your own live mmap to your own
still-refcounted page is not a primitive - it's just memory you own.
Why the LPE claim falls apart
-----------------------------
The reporter admits the cred cross-cache is impossible:
Closing the mmap frees the page but eliminates the write primitive
(fundamental mutual exclusion)
Translation: "to free the page I need to drop the only thing keeping
the write working." Then:
Current PoC achieves root via a research-context SUID binary
i.e. *given a custom SUID binary*, they reach root. That's not a
kernel-side LPE - that's "if you give me root, I have root."
Severity of the actual bug
--------------------------
The missing sqe_copy on the iowq re-issue path is real but benign:
- cmd_op is captured into cmd->cmd_op at prep (uring_cmd.c:206), so
opcode behavior can't be flipped.
- blkdev_uring_cmd re-reads sqe->ioprio/__pad1/len/rw_flags/file_index
for the must-be-zero check; tampering yields -EINVAL.
- It re-reads sqe->addr/addr3 for start/len. The user could change
which range gets discarded on the retry - but they could've passed
any range originally, so no privilege boundary is crossed.
Worth a small cleanup patch (call io_req_sqe_copy from the
blocking-issue path, or have io_uring_cmd_issue_blocking ensure the
copy), but not a security issue.
Other smell
-----------
- Path "fs/io_uring.c" in the report is wrong; the file moved to
io_uring/uring_cmd.c years ago. Suggests Carlo didn't actually read
current source.
- The reproducer setup (null_blk discard=1 size=1024
completion_nsec=500000000) is engineered specifically to get a
partial-discard NOWAIT -> bio-completes -> bic->res=-EAGAIN ->
io_uring_cmd_issue_blocking reissue window - i.e., it exercises the
missing-sqe-copy path, not a memory-corruption primitive.
Bottom line: not a UAF, not exploitable, and the reporter's own
write-up demonstrates they know it. Fine to acknowledge the
missing-sqe-copy and address it, but push back on the security framing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread