Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
Date: Thu, 23 Apr 2026 14:59:58 +0200	[thread overview]
Message-ID: <2026042314-traffic-riverbank-d9a1@gregkh> (raw)
In-Reply-To: <dee5b7e8-864d-48b5-99cf-282eedaba927@kernel.org>

On Thu, Apr 23, 2026 at 02:47:23PM +0200, David Hildenbrand (Arm) wrote:
> On 4/23/26 14:31, Greg Kroah-Hartman wrote:
> > The !CONFIG_MMU implementation of __get_user_pages_locked() takes a bare
> > get_page() reference for each page regardless of foll_flags:
> > 	if (pages[i])
> > 		get_page(pages[i]);
> > 
> > This is reached from pin_user_pages*() with FOLL_PIN set.
> > unpin_user_page() is shared between MMU and NOMMU configurations and
> > unconditionally calls gup_put_folio(..., FOLL_PIN), which subtracts
> > GUP_PIN_COUNTING_BIAS (1024) from the folio refcount.
> > 
> > This means that pin adds 1, and then unpin will subtract 1024.
> > 
> > If a user maps a page (refcount 1), registers it 1023 times as an
> > io_uring fixed buffer (1023 pin_user_pages calls -> refcount 1024), then
> > unregisters: the first unpin_user_page subtracts 1024, refcount hits 0,
> > the page is freed and returned to the buddy allocator.  The remaining
> > 1022 unpins write into whatever was reallocated, and the user's VMA
> > still maps the freed page (NOMMU has no MMU to invalidate it).
> > Reallocating the page for an io_uring pbuf_ring then lets userspace
> > corrupt the new owner's data through the stale mapping.
> > 
> > Use try_grab_folio() which adds GUP_PIN_COUNTING_BIAS for FOLL_PIN and 1
> > for FOLL_GET, mirroring the CONFIG_MMU path so pin and unpin are
> > symmetric.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Reported-by: Anthropic
> > Assisted-by: gkh_clanker_t1000
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> > My first foray into -mm, eeek!
> 
> Oh, nommu ... what a great use of our time.

Yeah, tell me about it.  I have been cursing a specific company's name a
lot these past days...

> I was briefly wondering if we want to add a Fixes: ... but then, this was likely
> broken for years and nobody cared so far in practice.

Agreed.

> > 
> > Anyway, this was a crazy report sent to me, and I knocked up this
> > change, and I have a reproducer if people need/want to see that as well
> > (it's for nommu systems, so be wary of it.)
> 
> [...]
> 
> > -				get_page(pages[i]);
> > +			if (pages[i]) {
> > +				/*
> > +				 * pin_user_pages*() arrives here with FOLL_PIN
> > +				 * set; unpin_user_page() (which is not
> > +				 * !CONFIG_MMU-specific) calls
> > +				 * gup_put_folio(..., FOLL_PIN) which subtracts
> > +				 * GUP_PIN_COUNTING_BIAS (1024).  A bare
> > +				 * get_page() here adds only 1, so 1023 pins on
> > +				 * a fresh page bring refcount to 1024 and a
> > +				 * single unpin then frees it out from under the
> > +				 * remaining 1022 pins and any live VMA
> > +				 * mappings. Use the same grab path as the MMU
> > +				 * implementation so pin and unpin are
> > +				 * symmetric.
> > +				 */
> 
> Yeah, drop all that. Especially the hardcoded 1024/1022 is just screaming for
> trouble longterm.

Ok, will drop!

> It just follows what we do everywhere else (e.g., follow_page_pte()).
> 
> 
> > +				if (try_grab_folio(page_folio(pages[i]), 1,
> > +						   foll_flags)) {
> > +					pages[i] = NULL;
> > +					break;
> > +				}
> > +			}
> 
> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
> 
> I know, I know, nobody cares. But if we touch it, we might just want to return
> the error we get from try_grab_folio().

So just abort here and return it?  No, that will not work, there's a
lock we would jump around.  How about something like this horrid thing,
adding back in the relevant unlikely() to match the other calls like
this:


diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..8fa5b37be8b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 	struct vm_area_struct *vma;
 	bool must_unlock = false;
 	vm_flags_t vm_flags;
+	int ret;
 	long i;
 
 	if (!nr_pages)
@@ -2019,8 +2020,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 
 		if (pages) {
 			pages[i] = virt_to_page((void *)start);
-			if (pages[i])
-				get_page(pages[i]);
+			if (pages[i]) {
+				ret = try_grab_folio(page_folio(pages[i]), 1,
+						     foll_flags);
+				if (unlikely(ret)) {
+					pages[i] = NULL;
+					i = ret;
+					break;
+				}
+			}
 		}
 
 		start = (start + PAGE_SIZE) & PAGE_MASK;

next prev parent reply	other threads:[~2026-04-23 13:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 12:31 [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked() Greg Kroah-Hartman
2026-04-23 12:47 ` David Hildenbrand (Arm)
2026-04-23 12:59   ` Greg Kroah-Hartman [this message]
2026-04-23 13:04     ` David Hildenbrand (Arm)
2026-04-23 13:11       ` Greg Kroah-Hartman
2026-04-23 13:42       ` Greg Kroah-Hartman
2026-04-23 13:00   ` David Hildenbrand (Arm)

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ad9ded39609 dfblob:8fa5b37be8b )
 OR (
bs:"Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2026042314-traffic-riverbank-d9a1@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox