From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: io-uring@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jens Axboe <axboe@kernel.dk>
Subject: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 15:46:16 +0200 [thread overview]
Message-ID: <2026042115-body-attention-d15b@gregkh> (raw)
Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.
The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page. Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.
Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them. The user can then write into whatever
the allocator hands out next.
Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close. NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap. If the region was unregistered between
mmap and munmap (region->pages is NULL after io_free_region's memset),
walk the VMA address range instead -- the pages are still live (our refs
kept them) and virt_to_page recovers them.
This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed. The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Note, I have no way of testing this, I'm only forwarding this on because
I got the bug report and was able to generate something that "seems"
correct, but it might be a total load of crap here, my knowledge of the
vm layer is very low so take this for where it is coming from (i.e. a
non-deterministic pattern matching system.)
I do have another patch that just disables io_uring for !MMU systems, if
you want that instead? Or is this feature something that !MMU devices
actually care about?
io_uring/memmap.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a..6818e9abf3b3 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,76 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
#else /* !CONFIG_MMU */
+/*
+ * Under NOMMU, get_unmapped_area returns the kernel virtual address of
+ * the io_mapped_region's backing pages directly -- the user's VMA
+ * aliases the kernel allocation rather than holding its own copy or
+ * page-table entries. The CONFIG_MMU path's vm_insert_pages() takes
+ * page references that survive until munmap; this path takes none, so
+ * io_unregister_pbuf_ring() -> io_free_region() -> release_pages()
+ * frees the pages while the user's VMA still maps them. The user can
+ * then write into whatever the buddy allocator hands out next.
+ *
+ * Mirror the MMU lifetime by taking page references in io_uring_mmap()
+ * and releasing them in vm_ops->close. We re-derive the region from
+ * vm_pgoff (same lookup get_unmapped_area used) so we know which pages
+ * to grab.
+ */
+
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+ struct io_ring_ctx *ctx = vma->vm_file->private_data;
+ struct io_mapped_region *region;
+ unsigned long i;
+
+ guard(mutex)(&ctx->mmap_lock);
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ /*
+ * The region may have been unregistered (memset to zero in
+ * io_free_region()) between mmap and munmap. The page refs we
+ * took in io_uring_mmap() are what kept the pages alive; release
+ * them via the VMA range since the region->pages array is gone.
+ */
+ if (region && region->pages) {
+ for (i = 0; i < region->nr_pages; i++)
+ put_page(region->pages[i]);
+ } else {
+ /* Region cleared; walk the VMA range. */
+ unsigned long a;
+
+ for (a = vma->vm_start; a < vma->vm_end; a += PAGE_SIZE)
+ put_page(virt_to_page((void *)a));
+ }
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+ .close = io_uring_nommu_vm_close,
+};
+
int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
- return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+ struct io_ring_ctx *ctx = file->private_data;
+ struct io_mapped_region *region;
+ unsigned long i;
+
+ if (!is_nommu_shared_mapping(vma->vm_flags))
+ return -EINVAL;
+
+ guard(mutex)(&ctx->mmap_lock);
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ if (!region || !io_region_is_set(region))
+ return -EINVAL;
+
+ /*
+ * Pin the pages so io_free_region()'s release_pages() does not
+ * drop the last reference while this VMA exists. delete_vma()
+ * in mm/nommu.c calls vma_close() which runs ->close above.
+ */
+ for (i = 0; i < region->nr_pages; i++)
+ get_page(region->pages[i]);
+
+ vma->vm_ops = &io_uring_nommu_vm_ops;
+ return 0;
}
unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
--
2.53.0
next reply other threads:[~2026-04-21 13:46 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 13:46 Greg Kroah-Hartman [this message]
2026-04-21 13:50 ` [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Jens Axboe
2026-04-21 13:55 ` Greg Kroah-Hartman
2026-04-21 14:02 ` Jens Axboe
2026-04-21 16:01 ` Greg Kroah-Hartman
2026-04-21 16:05 ` Jens Axboe
2026-04-21 16:21 ` Jens Axboe
2026-04-21 16:24 ` Greg Kroah-Hartman
2026-04-21 16:41 ` Jens Axboe
2026-04-21 17:04 ` Jens Axboe
2026-04-21 17:38 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22 1:17 ` Jens Axboe
2026-04-22 1:56 ` Jens Axboe
2026-04-22 2:26 ` Jens Axboe
2026-04-22 5:36 ` Greg Kroah-Hartman
2026-04-22 8:11 ` Greg Kroah-Hartman
2026-04-22 12:40 ` Jens Axboe
2026-04-22 13:03 ` Greg Kroah-Hartman
2026-04-22 13:06 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2026042115-body-attention-d15b@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.