[PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
@ 2026-04-23 12:31 Greg Kroah-Hartman
  2026-04-23 12:47 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-23 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Greg Kroah-Hartman, Andrew Morton,
	David Hildenbrand, Jason Gunthorpe, John Hubbard, Peter Xu

The !CONFIG_MMU implementation of __get_user_pages_locked() takes a bare
get_page() reference for each page regardless of foll_flags:
	if (pages[i])
		get_page(pages[i]);

This is reached from pin_user_pages*() with FOLL_PIN set.
unpin_user_page() is shared between MMU and NOMMU configurations and
unconditionally calls gup_put_folio(..., FOLL_PIN), which subtracts
GUP_PIN_COUNTING_BIAS (1024) from the folio refcount.

This means that pin adds 1, and then unpin will subtract 1024.

If a user maps a page (refcount 1), registers it 1023 times as an
io_uring fixed buffer (1023 pin_user_pages calls -> refcount 1024), then
unregisters: the first unpin_user_page subtracts 1024, refcount hits 0,
the page is freed and returned to the buddy allocator.  The remaining
1022 unpins write into whatever was reallocated, and the user's VMA
still maps the freed page (NOMMU has no MMU to invalidate it).
Reallocating the page for an io_uring pbuf_ring then lets userspace
corrupt the new owner's data through the stale mapping.

Use try_grab_folio() which adds GUP_PIN_COUNTING_BIAS for FOLL_PIN and 1
for FOLL_GET, mirroring the CONFIG_MMU path so pin and unpin are
symmetric.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
My first foray into -mm, eeek!

Anyway, this was a crazy report sent to me, and I knocked up this
change, and I have a reproducer if people need/want to see that as well
(it's for nommu systems, so be wary of it.)

If I should drop the huge comment, I'll be glad to respin, but I thought
it was good to try to document this somewhere as it didn't seem obvious,
at least to me, what was going on...

thanks,

greg k-h

 mm/gup.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..c8744fb8a395 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2019,8 +2019,27 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,

 		if (pages) {
 			pages[i] = virt_to_page((void *)start);
-			if (pages[i])
-				get_page(pages[i]);
+			if (pages[i]) {
+				/*
+				 * pin_user_pages*() arrives here with FOLL_PIN
+				 * set; unpin_user_page() (which is not
+				 * !CONFIG_MMU-specific) calls
+				 * gup_put_folio(..., FOLL_PIN) which subtracts
+				 * GUP_PIN_COUNTING_BIAS (1024).  A bare
+				 * get_page() here adds only 1, so 1023 pins on
+				 * a fresh page bring refcount to 1024 and a
+				 * single unpin then frees it out from under the
+				 * remaining 1022 pins and any live VMA
+				 * mappings. Use the same grab path as the MMU
+				 * implementation so pin and unpin are
+				 * symmetric.
+				 */
+				if (try_grab_folio(page_folio(pages[i]), 1,
+						   foll_flags)) {
+					pages[i] = NULL;
+					break;
+				}
+			}
 		}

 		start = (start + PAGE_SIZE) & PAGE_MASK;
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 12:31 [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked() Greg Kroah-Hartman
@ 2026-04-23 12:47 ` David Hildenbrand (Arm)
  2026-04-23 12:59   ` Greg Kroah-Hartman
  2026-04-23 13:00   ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-23 12:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-mm
  Cc: linux-kernel, Andrew Morton, Jason Gunthorpe, John Hubbard,
	Peter Xu

On 4/23/26 14:31, Greg Kroah-Hartman wrote:
> The !CONFIG_MMU implementation of __get_user_pages_locked() takes a bare
> get_page() reference for each page regardless of foll_flags:
> 	if (pages[i])
> 		get_page(pages[i]);
> 
> This is reached from pin_user_pages*() with FOLL_PIN set.
> unpin_user_page() is shared between MMU and NOMMU configurations and
> unconditionally calls gup_put_folio(..., FOLL_PIN), which subtracts
> GUP_PIN_COUNTING_BIAS (1024) from the folio refcount.
> 
> This means that pin adds 1, and then unpin will subtract 1024.
> 
> If a user maps a page (refcount 1), registers it 1023 times as an
> io_uring fixed buffer (1023 pin_user_pages calls -> refcount 1024), then
> unregisters: the first unpin_user_page subtracts 1024, refcount hits 0,
> the page is freed and returned to the buddy allocator.  The remaining
> 1022 unpins write into whatever was reallocated, and the user's VMA
> still maps the freed page (NOMMU has no MMU to invalidate it).
> Reallocating the page for an io_uring pbuf_ring then lets userspace
> corrupt the new owner's data through the stale mapping.
> 
> Use try_grab_folio() which adds GUP_PIN_COUNTING_BIAS for FOLL_PIN and 1
> for FOLL_GET, mirroring the CONFIG_MMU path so pin and unpin are
> symmetric.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Peter Xu <peterx@redhat.com>
> Reported-by: Anthropic
> Assisted-by: gkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> My first foray into -mm, eeek!

Oh, nommu ... what a great use of our time.

I was briefly wondering if we want to add a Fixes: ... but then, this was likely
broken for years and nobody cared so far in practice.

> 
> Anyway, this was a crazy report sent to me, and I knocked up this
> change, and I have a reproducer if people need/want to see that as well
> (it's for nommu systems, so be wary of it.)

[...]

> -				get_page(pages[i]);
> +			if (pages[i]) {
> +				/*
> +				 * pin_user_pages*() arrives here with FOLL_PIN
> +				 * set; unpin_user_page() (which is not
> +				 * !CONFIG_MMU-specific) calls
> +				 * gup_put_folio(..., FOLL_PIN) which subtracts
> +				 * GUP_PIN_COUNTING_BIAS (1024).  A bare
> +				 * get_page() here adds only 1, so 1023 pins on
> +				 * a fresh page bring refcount to 1024 and a
> +				 * single unpin then frees it out from under the
> +				 * remaining 1022 pins and any live VMA
> +				 * mappings. Use the same grab path as the MMU
> +				 * implementation so pin and unpin are
> +				 * symmetric.
> +				 */

Yeah, drop all that. Especially the hardcoded 1024/1022 is just screaming for
trouble longterm.

It just follows what we do everywhere else (e.g., follow_page_pte()).


> +				if (try_grab_folio(page_folio(pages[i]), 1,
> +						   foll_flags)) {
> +					pages[i] = NULL;
> +					break;
> +				}
> +			}

If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.

I know, I know, nobody cares. But if we touch it, we might just want to return
the error we get from try_grab_folio().

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 12:47 ` David Hildenbrand (Arm)
@ 2026-04-23 12:59   ` Greg Kroah-Hartman
  2026-04-23 13:04     ` David Hildenbrand (Arm)
  2026-04-23 13:00   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-23 12:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, Andrew Morton, Jason Gunthorpe,
	John Hubbard, Peter Xu

On Thu, Apr 23, 2026 at 02:47:23PM +0200, David Hildenbrand (Arm) wrote:
> On 4/23/26 14:31, Greg Kroah-Hartman wrote:
> > The !CONFIG_MMU implementation of __get_user_pages_locked() takes a bare
> > get_page() reference for each page regardless of foll_flags:
> > 	if (pages[i])
> > 		get_page(pages[i]);
> > 
> > This is reached from pin_user_pages*() with FOLL_PIN set.
> > unpin_user_page() is shared between MMU and NOMMU configurations and
> > unconditionally calls gup_put_folio(..., FOLL_PIN), which subtracts
> > GUP_PIN_COUNTING_BIAS (1024) from the folio refcount.
> > 
> > This means that pin adds 1, and then unpin will subtract 1024.
> > 
> > If a user maps a page (refcount 1), registers it 1023 times as an
> > io_uring fixed buffer (1023 pin_user_pages calls -> refcount 1024), then
> > unregisters: the first unpin_user_page subtracts 1024, refcount hits 0,
> > the page is freed and returned to the buddy allocator.  The remaining
> > 1022 unpins write into whatever was reallocated, and the user's VMA
> > still maps the freed page (NOMMU has no MMU to invalidate it).
> > Reallocating the page for an io_uring pbuf_ring then lets userspace
> > corrupt the new owner's data through the stale mapping.
> > 
> > Use try_grab_folio() which adds GUP_PIN_COUNTING_BIAS for FOLL_PIN and 1
> > for FOLL_GET, mirroring the CONFIG_MMU path so pin and unpin are
> > symmetric.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Reported-by: Anthropic
> > Assisted-by: gkh_clanker_t1000
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> > My first foray into -mm, eeek!
> 
> Oh, nommu ... what a great use of our time.

Yeah, tell me about it.  I have been cursing a specific company's name a
lot these past days...

> I was briefly wondering if we want to add a Fixes: ... but then, this was likely
> broken for years and nobody cared so far in practice.

Agreed.

> > 
> > Anyway, this was a crazy report sent to me, and I knocked up this
> > change, and I have a reproducer if people need/want to see that as well
> > (it's for nommu systems, so be wary of it.)
> 
> [...]
> 
> > -				get_page(pages[i]);
> > +			if (pages[i]) {
> > +				/*
> > +				 * pin_user_pages*() arrives here with FOLL_PIN
> > +				 * set; unpin_user_page() (which is not
> > +				 * !CONFIG_MMU-specific) calls
> > +				 * gup_put_folio(..., FOLL_PIN) which subtracts
> > +				 * GUP_PIN_COUNTING_BIAS (1024).  A bare
> > +				 * get_page() here adds only 1, so 1023 pins on
> > +				 * a fresh page bring refcount to 1024 and a
> > +				 * single unpin then frees it out from under the
> > +				 * remaining 1022 pins and any live VMA
> > +				 * mappings. Use the same grab path as the MMU
> > +				 * implementation so pin and unpin are
> > +				 * symmetric.
> > +				 */
> 
> Yeah, drop all that. Especially the hardcoded 1024/1022 is just screaming for
> trouble longterm.

Ok, will drop!

> It just follows what we do everywhere else (e.g., follow_page_pte()).
> 
> 
> > +				if (try_grab_folio(page_folio(pages[i]), 1,
> > +						   foll_flags)) {
> > +					pages[i] = NULL;
> > +					break;
> > +				}
> > +			}
> 
> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
> 
> I know, I know, nobody cares. But if we touch it, we might just want to return
> the error we get from try_grab_folio().

So just abort here and return it?  No, that will not work, there's a
lock we would jump around.  How about something like this horrid thing,
adding back in the relevant unlikely() to match the other calls like
this:


diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..8fa5b37be8b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 	struct vm_area_struct *vma;
 	bool must_unlock = false;
 	vm_flags_t vm_flags;
+	int ret;
 	long i;
 
 	if (!nr_pages)
@@ -2019,8 +2020,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 
 		if (pages) {
 			pages[i] = virt_to_page((void *)start);
-			if (pages[i])
-				get_page(pages[i]);
+			if (pages[i]) {
+				ret = try_grab_folio(page_folio(pages[i]), 1,
+						     foll_flags);
+				if (unlikely(ret)) {
+					pages[i] = NULL;
+					i = ret;
+					break;
+				}
+			}
 		}
 
 		start = (start + PAGE_SIZE) & PAGE_MASK;


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 12:47 ` David Hildenbrand (Arm)
  2026-04-23 12:59   ` Greg Kroah-Hartman
@ 2026-04-23 13:00   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-23 13:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-mm
  Cc: linux-kernel, Andrew Morton, Jason Gunthorpe, John Hubbard,
	Peter Xu

On 4/23/26 14:47, David Hildenbrand (Arm) wrote:
> On 4/23/26 14:31, Greg Kroah-Hartman wrote:
>> The !CONFIG_MMU implementation of __get_user_pages_locked() takes a bare
>> get_page() reference for each page regardless of foll_flags:
>> 	if (pages[i])
>> 		get_page(pages[i]);
>>
>> This is reached from pin_user_pages*() with FOLL_PIN set.
>> unpin_user_page() is shared between MMU and NOMMU configurations and
>> unconditionally calls gup_put_folio(..., FOLL_PIN), which subtracts
>> GUP_PIN_COUNTING_BIAS (1024) from the folio refcount.
>>
>> This means that pin adds 1, and then unpin will subtract 1024.
>>
>> If a user maps a page (refcount 1), registers it 1023 times as an
>> io_uring fixed buffer (1023 pin_user_pages calls -> refcount 1024), then
>> unregisters: the first unpin_user_page subtracts 1024, refcount hits 0,
>> the page is freed and returned to the buddy allocator.  The remaining
>> 1022 unpins write into whatever was reallocated, and the user's VMA
>> still maps the freed page (NOMMU has no MMU to invalidate it).
>> Reallocating the page for an io_uring pbuf_ring then lets userspace
>> corrupt the new owner's data through the stale mapping.
>>
>> Use try_grab_folio() which adds GUP_PIN_COUNTING_BIAS for FOLL_PIN and 1
>> for FOLL_GET, mirroring the CONFIG_MMU path so pin and unpin are
>> symmetric.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Reported-by: Anthropic
>> Assisted-by: gkh_clanker_t1000
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>> My first foray into -mm, eeek!
> 
> Oh, nommu ... what a great use of our time.
> 
> I was briefly wondering if we want to add a Fixes: ... but then, this was likely
> broken for years and nobody cared so far in practice.
> 
>>
>> Anyway, this was a crazy report sent to me, and I knocked up this
>> change, and I have a reproducer if people need/want to see that as well
>> (it's for nommu systems, so be wary of it.)
> 
> [...]
> 
>> -				get_page(pages[i]);
>> +			if (pages[i]) {
>> +				/*
>> +				 * pin_user_pages*() arrives here with FOLL_PIN
>> +				 * set; unpin_user_page() (which is not
>> +				 * !CONFIG_MMU-specific) calls
>> +				 * gup_put_folio(..., FOLL_PIN) which subtracts
>> +				 * GUP_PIN_COUNTING_BIAS (1024).  A bare
>> +				 * get_page() here adds only 1, so 1023 pins on
>> +				 * a fresh page bring refcount to 1024 and a
>> +				 * single unpin then frees it out from under the
>> +				 * remaining 1022 pins and any live VMA
>> +				 * mappings. Use the same grab path as the MMU
>> +				 * implementation so pin and unpin are
>> +				 * symmetric.
>> +				 */
> 
> Yeah, drop all that. Especially the hardcoded 1024/1022 is just screaming for
> trouble longterm.
> 
> It just follows what we do everywhere else (e.g., follow_page_pte()).
> 
> 
>> +				if (try_grab_folio(page_folio(pages[i]), 1,
>> +						   foll_flags)) {
>> +					pages[i] = NULL;
>> +					break;
>> +				}
>> +			}
> 
> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
> 
> I know, I know, nobody cares. But if we touch it, we might just want to return
> the error we get from try_grab_folio().
> 


BTW, looking into this, I am not sure if continuing on !pages[i] is a sane thing
to do.

IIRC, we are not supposed to return NULL-page pointer from this function.

We support it in unpin_user_pages() only for internal purposes.
unpin_user_pages_dirty_lock() does not support/expect that.

That should likely be an -EFAULT (if the first page).

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 12:59   ` Greg Kroah-Hartman
@ 2026-04-23 13:04     ` David Hildenbrand (Arm)
  2026-04-23 13:11       ` Greg Kroah-Hartman
  2026-04-23 13:42       ` Greg Kroah-Hartman
  0 siblings, 2 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-23 13:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-mm, linux-kernel, Andrew Morton, Jason Gunthorpe,
	John Hubbard, Peter Xu


>> It just follows what we do everywhere else (e.g., follow_page_pte()).
>>
>>
>>> +				if (try_grab_folio(page_folio(pages[i]), 1,
>>> +						   foll_flags)) {
>>> +					pages[i] = NULL;
>>> +					break;
>>> +				}
>>> +			}
>>
>> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
>>
>> I know, I know, nobody cares. But if we touch it, we might just want to return
>> the error we get from try_grab_folio().
> 
> So just abort here and return it?  No, that will not work, there's a
> lock we would jump around.  How about something like this horrid thing,
> adding back in the relevant unlikely() to match the other calls like
> this:
> 
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index ad9ded39609c..8fa5b37be8b7 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>  	struct vm_area_struct *vma;
>  	bool must_unlock = false;
>  	vm_flags_t vm_flags;
> +	int ret;
>  	long i;
>  
>  	if (!nr_pages)
> @@ -2019,8 +2020,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>  
>  		if (pages) {
>  			pages[i] = virt_to_page((void *)start);
> -			if (pages[i])
> -				get_page(pages[i]);
> +			if (pages[i]) {
> +				ret = try_grab_folio(page_folio(pages[i]), 1,
> +						     foll_flags);
> +				if (unlikely(ret)) {
> +					pages[i] = NULL;
> +					i = ret;
> +					break;

So, we have to report the old i in case of an error.

Assuming we want to also return -EFAULT on !pages[i] -- which i think we want,
maybe the following (uncompiled and untested)?

diff --git a/mm/gup.c b/mm/gup.c
index ad9ded39609c..d00ff8d988fa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
        struct vm_area_struct *vma;
        bool must_unlock = false;
        vm_flags_t vm_flags;
+       int ret, err = -EFAULT;
        long i;
 
        if (!nr_pages)
@@ -2019,8 +2020,14 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 
                if (pages) {
                        pages[i] = virt_to_page((void *)start);
-                       if (pages[i])
-                               get_page(pages[i]);
+                       if (!pages[i])
+                               break;
+                       ret = try_grab_folio(page_folio(pages[i]), 1, foll_flags);
+                       if (ret) {
+                               pages[i] = NULL;
+                               err = ret;
+                               break
+                       }
                }
 
                start = (start + PAGE_SIZE) & PAGE_MASK;
@@ -2031,7 +2038,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
                *locked = 0;
        }
 
-       return i ? : -EFAULT;
+       return i ? : err;
 }
 #endif /* !CONFIG_MMU */


-- 
Cheers,

David

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 13:04     ` David Hildenbrand (Arm)
@ 2026-04-23 13:11       ` Greg Kroah-Hartman
  2026-04-23 13:42       ` Greg Kroah-Hartman
  1 sibling, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-23 13:11 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, Andrew Morton, Jason Gunthorpe,
	John Hubbard, Peter Xu

On Thu, Apr 23, 2026 at 03:04:56PM +0200, David Hildenbrand (Arm) wrote:
> 
> >> It just follows what we do everywhere else (e.g., follow_page_pte()).
> >>
> >>
> >>> +				if (try_grab_folio(page_folio(pages[i]), 1,
> >>> +						   foll_flags)) {
> >>> +					pages[i] = NULL;
> >>> +					break;
> >>> +				}
> >>> +			}
> >>
> >> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
> >>
> >> I know, I know, nobody cares. But if we touch it, we might just want to return
> >> the error we get from try_grab_folio().
> > 
> > So just abort here and return it?  No, that will not work, there's a
> > lock we would jump around.  How about something like this horrid thing,
> > adding back in the relevant unlikely() to match the other calls like
> > this:
> > 
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index ad9ded39609c..8fa5b37be8b7 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> >  	struct vm_area_struct *vma;
> >  	bool must_unlock = false;
> >  	vm_flags_t vm_flags;
> > +	int ret;
> >  	long i;
> >  
> >  	if (!nr_pages)
> > @@ -2019,8 +2020,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> >  
> >  		if (pages) {
> >  			pages[i] = virt_to_page((void *)start);
> > -			if (pages[i])
> > -				get_page(pages[i]);
> > +			if (pages[i]) {
> > +				ret = try_grab_folio(page_folio(pages[i]), 1,
> > +						     foll_flags);
> > +				if (unlikely(ret)) {
> > +					pages[i] = NULL;
> > +					i = ret;
> > +					break;
> 
> So, we have to report the old i in case of an error.
> 
> Assuming we want to also return -EFAULT on !pages[i] -- which i think we want,
> maybe the following (uncompiled and untested)?
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index ad9ded39609c..d00ff8d988fa 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>         struct vm_area_struct *vma;
>         bool must_unlock = false;
>         vm_flags_t vm_flags;
> +       int ret, err = -EFAULT;
>         long i;
>  
>         if (!nr_pages)
> @@ -2019,8 +2020,14 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>  
>                 if (pages) {
>                         pages[i] = virt_to_page((void *)start);
> -                       if (pages[i])
> -                               get_page(pages[i]);
> +                       if (!pages[i])
> +                               break;
> +                       ret = try_grab_folio(page_folio(pages[i]), 1, foll_flags);
> +                       if (ret) {
> +                               pages[i] = NULL;
> +                               err = ret;
> +                               break
> +                       }
>                 }
>  
>                 start = (start + PAGE_SIZE) & PAGE_MASK;
> @@ -2031,7 +2038,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>                 *locked = 0;
>         }
>  
> -       return i ? : -EFAULT;
> +       return i ? : err;
>  }
>  #endif /* !CONFIG_MMU */

Cool, let me through this at my reproducer and see how it goes!

Ugh, nommu...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked()
  2026-04-23 13:04     ` David Hildenbrand (Arm)
  2026-04-23 13:11       ` Greg Kroah-Hartman
@ 2026-04-23 13:42       ` Greg Kroah-Hartman
  1 sibling, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2026-04-23 13:42 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, Andrew Morton, Jason Gunthorpe,
	John Hubbard, Peter Xu

On Thu, Apr 23, 2026 at 03:04:56PM +0200, David Hildenbrand (Arm) wrote:
> 
> >> It just follows what we do everywhere else (e.g., follow_page_pte()).
> >>
> >>
> >>> +				if (try_grab_folio(page_folio(pages[i]), 1,
> >>> +						   foll_flags)) {
> >>> +					pages[i] = NULL;
> >>> +					break;
> >>> +				}
> >>> +			}
> >>
> >> If it fails on the first iteration, we return -EFAULT instead of -ENOMEM.
> >>
> >> I know, I know, nobody cares. But if we touch it, we might just want to return
> >> the error we get from try_grab_folio().
> > 
> > So just abort here and return it?  No, that will not work, there's a
> > lock we would jump around.  How about something like this horrid thing,
> > adding back in the relevant unlikely() to match the other calls like
> > this:
> > 
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index ad9ded39609c..8fa5b37be8b7 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> >  	struct vm_area_struct *vma;
> >  	bool must_unlock = false;
> >  	vm_flags_t vm_flags;
> > +	int ret;
> >  	long i;
> >  
> >  	if (!nr_pages)
> > @@ -2019,8 +2020,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> >  
> >  		if (pages) {
> >  			pages[i] = virt_to_page((void *)start);
> > -			if (pages[i])
> > -				get_page(pages[i]);
> > +			if (pages[i]) {
> > +				ret = try_grab_folio(page_folio(pages[i]), 1,
> > +						     foll_flags);
> > +				if (unlikely(ret)) {
> > +					pages[i] = NULL;
> > +					i = ret;
> > +					break;
> 
> So, we have to report the old i in case of an error.
> 
> Assuming we want to also return -EFAULT on !pages[i] -- which i think we want,
> maybe the following (uncompiled and untested)?
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index ad9ded39609c..d00ff8d988fa 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1983,6 +1983,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>         struct vm_area_struct *vma;
>         bool must_unlock = false;
>         vm_flags_t vm_flags;
> +       int ret, err = -EFAULT;
>         long i;
>  
>         if (!nr_pages)
> @@ -2019,8 +2020,14 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>  
>                 if (pages) {
>                         pages[i] = virt_to_page((void *)start);
> -                       if (pages[i])
> -                               get_page(pages[i]);
> +                       if (!pages[i])
> +                               break;
> +                       ret = try_grab_folio(page_folio(pages[i]), 1, foll_flags);
> +                       if (ret) {
> +                               pages[i] = NULL;
> +                               err = ret;
> +                               break
> +                       }
>                 }
>  
>                 start = (start + PAGE_SIZE) & PAGE_MASK;
> @@ -2031,7 +2038,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
>                 *locked = 0;
>         }
>  
> -       return i ? : -EFAULT;
> +       return i ? : err;
>  }
>  #endif /* !CONFIG_MMU */

Yup, this passes my tests, let me go send a v2 with this version,
thanks!

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-23 13:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 12:31 [PATCH] mm/gup: honour FOLL_PIN in NOMMU __get_user_pages_locked() Greg Kroah-Hartman
2026-04-23 12:47 ` David Hildenbrand (Arm)
2026-04-23 12:59   ` Greg Kroah-Hartman
2026-04-23 13:04     ` David Hildenbrand (Arm)
2026-04-23 13:11       ` Greg Kroah-Hartman
2026-04-23 13:42       ` Greg Kroah-Hartman
2026-04-23 13:00   ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox