Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: io-uring@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 20:26:08 -0600	[thread overview]
Message-ID: <9c20876f-1cdb-429a-abb3-5ddbcd8cac00@kernel.dk> (raw)
In-Reply-To: <f1b43e56-4724-4635-b18b-bae2add37936@kernel.dk>

On 4/21/26 7:56 PM, Jens Axboe wrote:
> On 4/21/26 7:17 PM, Jens Axboe wrote:
>> On 4/21/26 11:39 AM, Jens Axboe wrote:
>>>
>>> On Tue, 21 Apr 2026 15:46:16 +0200, Greg Kroah-Hartman wrote:
>>>> Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
>>>> virtual address of the io_mapped_region's backing pages directly;
>>>> the user's VMA aliases the kernel allocation. io_uring_mmap() then
>>>> just returns 0 -- it takes no page references.
>>>>
>>>> The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
>>>> each inserted page.  Those references are released when the VMA is torn
>>>> down (zap_pte_range -> put_page). io_free_region() -> release_pages()
>>>> drops the io_uring-side references, but the pages survive until munmap
>>>> drops the VMA-side references.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [1/1] io_uring: take page references for NOMMU pbuf_ring mmaps
>>>       commit: d9b7b3d9c5286a786c7fe8220c55a6e012088c2e
>>
>> Actually, I take that back - what prevents the io_mmap_get_region()
>> in the newly added io_uring_nommu_vm_close() from getting the same
>> region that we initially referenced the pages from in the nommu
>> variant of io_uring_mmap()?
> 
> I think we can get rid of that and simplify the code at the same
> time. Rather than need to re-lookup the buffer list, we can just iterate
> the pages mapped in the vma. Since this is a file backed mapping and
> io_uring doesn't allow remaps, that should always be the same.
> 
> Greg, can you test this? I will fold this in.

Here's the full patch - the incremental was missing a ')'. And
for good measure, ensure that the vma size matches the pages in
the region.

commit d0be8884f56b0b800cd8966e37ce23417cd5044e
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Tue Apr 21 15:46:16 2026 +0200

    io_uring: take page references for NOMMU pbuf_ring mmaps
    
    Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
    virtual address of the io_mapped_region's backing pages directly;
    the user's VMA aliases the kernel allocation. io_uring_mmap() then
    just returns 0 -- it takes no page references.
    
    The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
    each inserted page.  Those references are released when the VMA is torn
    down (zap_pte_range -> put_page). io_free_region() -> release_pages()
    drops the io_uring-side references, but the pages survive until munmap
    drops the VMA-side references.
    
    Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
    io_put_bl -> io_free_region -> release_pages drops the only references
    and the pages return to the buddy allocator while the user's VMA still
    has vm_start pointing into them.  The user can then write into whatever
    the allocator hands out next.
    
    Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
    release them via vm_ops->close.  NOMMU's delete_vma() calls vma_close()
    which runs ->close on munmap.
    
    This also incidentally addresses the duplicate-vm_start case: two mmaps
    of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
    With page refs taken per mmap, the second mmap takes its own refs and
    the pages survive until both mmaps are closed.  The nommu rb-tree BUG_ON
    on duplicate vm_start is a separate mm/nommu.c concern (it should share
    the existing region rather than BUG), but the page lifetime is now
    correct.
    
    Cc: Jens Axboe <axboe@kernel.dk>
    Reported-by: Anthropic
    Assisted-by: gkh_clanker_t1000
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
    [axboe: get rid of region lookup, just iterate pages in vma]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a..4f9b439319c4 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,53 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
 
 #else /* !CONFIG_MMU */
 
+/*
+ * Drop the pages that were initially referenced and added in
+ * io_uring_mmap(). We cannot have had a mremap() as that isn't supported,
+ * hence the vma should be identical to the one we initially referenced and
+ * mapped, and partial unmaps and splitting isn't possible on a file backed
+ * mapping.
+ */
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+	unsigned long index;
+
+	for (index = vma->vm_start; index < vma->vm_end; index += PAGE_SIZE)
+		put_page(virt_to_page((void *) index));
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+	.close = io_uring_nommu_vm_close,
+};
+
 int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+	struct io_ring_ctx *ctx = file->private_data;
+	struct io_mapped_region *region;
+	unsigned long i;
+
+	if (!is_nommu_shared_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	guard(mutex)(&ctx->mmap_lock);
+	region = io_mmap_get_region(ctx, vma->vm_pgoff);
+	if (!region || !io_region_is_set(region))
+		return -EINVAL;
+
+	if ((vma->vm_end - vma->vm_start) !=
+	    (unsigned long) region->nr_pages << PAGE_SHIFT)
+		return -EINVAL;
+
+	/*
+	 * Pin the pages so io_free_region()'s release_pages() does not
+	 * drop the last reference while this VMA exists. delete_vma()
+	 * in mm/nommu.c calls vma_close() which runs ->close above.
+	 */
+	for (i = 0; i < region->nr_pages; i++)
+		get_page(region->pages[i]);
+
+	vma->vm_ops = &io_uring_nommu_vm_ops;
+	return 0;
 }
 
 unsigned int io_uring_nommu_mmap_capabilities(struct file *file)

-- 
Jens Axboe

next prev parent reply	other threads:[~2026-04-22  2:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 13:46 [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Greg Kroah-Hartman
2026-04-21 13:50 ` Jens Axboe
2026-04-21 13:55   ` Greg Kroah-Hartman
2026-04-21 14:02     ` Jens Axboe
2026-04-21 16:01     ` Greg Kroah-Hartman
2026-04-21 16:05       ` Jens Axboe
2026-04-21 16:21         ` Jens Axboe
2026-04-21 16:24           ` Greg Kroah-Hartman
2026-04-21 16:41             ` Jens Axboe
2026-04-21 17:04               ` Jens Axboe
2026-04-21 17:38                 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22  1:17   ` Jens Axboe
2026-04-22  1:56     ` Jens Axboe
2026-04-22  2:26       ` Jens Axboe [this message]
2026-04-22  5:36         ` Greg Kroah-Hartman
2026-04-22  8:11         ` Greg Kroah-Hartman
2026-04-22 12:40           ` Jens Axboe
2026-04-22 13:03             ` Greg Kroah-Hartman
2026-04-22 13:06               ` Jens Axboe

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e6958968975 dfblob:4f9b439319c )
 OR (
bs:"Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c20876f-1cdb-429a-abb3-5ddbcd8cac00@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.